From 76e3dd46803ca7d41ad0987f963c8aae36d07799 Mon Sep 17 00:00:00 2001 From: ZHU Yuhao Date: Sun, 29 Mar 2026 18:20:18 +0200 Subject: [PATCH 1/6] Update readme --- README.md | 4 +--- docs/readme_cli.md | 0 2 files changed, 1 insertion(+), 3 deletions(-) create mode 100644 docs/readme_cli.md diff --git a/README.md b/README.md index 6020858b..a9cd6a2a 100644 --- a/README.md +++ b/README.md @@ -40,11 +40,9 @@ The core types are[^auxiliary]: | `Decimal` | `BDec`, `BigDecimal` | Equivalent to Python's `decimal.Decimal` | Base-10^9 | | `Dec128` | `Decimal128` | 128-bit fixed-precision decimal type | Triple 32-bit words | ---- - **Decimo** combines "**Deci**mal" and "**Mo**jo" - reflecting its purpose and implementation language. "Decimo" is also a Latin word meaning "tenth" and is the root of the word "decimal". ---- +A CLI calculator tool, built on top of the Decimo library and powered by [ArgMojo](https://github.com/forfudan/argmojo) (a command-line argument parser library), is also available in this repository. It provides a convenient way to perform high-precision calculations directly from the command line. This repository includes a built-in [TOML parser](./docs/readme_toml.md) (`decimo.toml`), a lightweight pure-Mojo implementation supporting TOML v1.0. It parses configuration files and test data, supporting basic types, arrays, and nested tables. While created for Decimo's testing framework, it offers general-purpose structured data parsing with a clean, simple API. diff --git a/docs/readme_cli.md b/docs/readme_cli.md new file mode 100644 index 00000000..e69de29b From c79efb6d15223d19de6f2f0df2fceb43694cd9fb Mon Sep 17 00:00:00 2001 From: ZHU Yuhao Date: Fri, 3 Apr 2026 19:34:14 +0200 Subject: [PATCH 2/6] Add plan for gmp integration --- docs/plans/gmp_integration.md | 1411 +++++++++++++++++++++++++++++++++ 1 file changed, 1411 insertions(+) create mode 100644 docs/plans/gmp_integration.md diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md new file mode 100644 index 00000000..4fbb5c4b --- /dev/null +++ b/docs/plans/gmp_integration.md @@ -0,0 +1,1411 @@ +# GMP Integration Plan for Decimo + +> **Date**: 2026-04-02 +> **Target**: decimo >=0.9.0 +> **Mojo Version**: >=0.26.2 +> **GMP Version**: 6.3.0 (tested on macOS ARM64, Apple Silicon) +> +> 子曰工欲善其事必先利其器 +> The mechanic, who wishes to do his work well, must first sharpen his tools -- Confucius + +## 1. Executive Summary + +### Verdict: FEASIBLE ✓ + +GMP integration into Decimo is technically feasible and has been proven through a working prototype. A C wrapper approach (`gmp_wrapper.c`) successfully bridges Mojo's FFI layer to GMP, with all core arithmetic operations verified. + +### Key Findings + +| Aspect | Status | Notes | +| ------------------------ | ----------- | ------------------------------------------------------------------------------- | +| Mojo → C FFI | ✓ Works | Via `external_call` with link-time binding | +| GMP arithmetic ops | ✓ All work | Add, sub, mul, div, mod, pow, sqrt, gcd, bitwise | +| Build pipeline | ✓ Works | `mojo build` with `-Xlinker` flags | +| Runtime detection | ! Partial | Build-time detection feasible; true runtime dlopen not available in Mojo 0.26.2 | +| BigInt ↔ GMP conversion | ✓ Efficient | Both use base-2^32, near-trivial O(n) conversion | +| BigUInt/BigDecimal ↔ GMP | ! Costly | Requires base-10^9 → binary conversion (O(n²) naïve, O(n·log²n) D&C) | +| `mojo run` (JIT) | ✗ Broken | `-Xlinker` flags ignored in JIT mode; only `mojo build` works | +| Cross-platform | ! Varies | macOS/Linux straightforward; Windows needs MPIR or MinGW GMP | + +### Recommendation + +Integrate GMP as an **optional backend for BigInt** (base-2^32) first, where data conversion +is cheap and measured speedups are 7–18× across operations. BigDecimal integration should +follow only for operations where GMP's speed advantage (15–353× in benchmarks) outweighs +the base-10⁹ ↔ binary conversion cost — primarily large multiplications, sqrt, and GCD at +1,000+ digits. + +--- + +## 2. Technical Feasibility Assessment + +### 2.1 What Was Tested + +A complete prototype was built in `temp/gmp/` consisting of: + +1. **C wrapper** (`gmp_wrapper.c`): Handle-based API wrapping GMP's `mpz_t` operations +2. **Shared library** (`libgmp_wrapper.dylib`): Compiled C wrapper linked against GMP +3. **Mojo FFI test** (`test_gmp_ffi.mojo`): 18 tests covering all operation categories +4. **Mojo benchmark** (`bench_gmp_vs_decimo.mojo`): Performance characterization across digit sizes + +### 2.2 Prototype Results + +| Test Category | Result | +| -------------------------- | ------------------------------------- | +| Initialization & cleanup | ✓ Pass | +| String import/export | ✓ Pass | +| Addition, subtraction | ✓ Pass | +| Multiplication | ✓ Pass | +| Floor/truncate division | ✓ Pass | +| Modulo | ✓ Pass | +| Power, modular power | ✓ Pass | +| Square root | ✓ Pass | +| GCD | ✓ Pass | +| Negation, absolute value | ✓ Pass | +| Comparison, sign | ✓ Pass | +| Bitwise AND, OR, XOR | ✓ Pass | +| Left/right shift | ✓ Pass | +| Availability/version check | ✓ Pass | +| UInt32 word import/export | ! Partial (parameter alignment issue) | + +### 2.3 Why C Wrapper Is Required + +Direct FFI to GMP is impractical in Mojo 0.26.2 because: + +1. **No `DLHandle`**: Mojo 0.26.2 does not expose `DLHandle` for runtime dynamic library loading. Only `external_call` (compile-time/link-time binding) is available. +2. **`mpz_t` is a struct**: GMP's `mpz_t` is `__mpz_struct[1]` (a 1-element array used as a pass-by-reference trick). Mojo cannot directly represent or pass this. +3. **Pointer origin tracking**: `UnsafePointer` in Mojo 0.26.2 has strict origin tracking that prevents constructing pointers from raw integer addresses in non-`fn` contexts. +4. **GMP macros**: Many GMP operations (e.g., `mpz_init`, `mpz_clear`) are preprocessor macros, not true function symbols. + +The C wrapper solves all these issues by exposing a flat, handle-based C API where Mojo passes and receives only primitive types (`Int32`, `Int`). + +--- + +## 3. Mojo FFI Capabilities and Constraints + +### 3.1 Available FFI Mechanism + +```mojo +from sys.ffi import external_call + +# Example: calling a C function +var result = external_call["gmpw_add", NoneType, Int32, Int32, Int32](a, b, c) +``` + +- `external_call` resolves symbols at **link time** (not runtime) +- Function signature must be known at compile time +- Supports: `Int`, `Int32`, `UInt32`, `Float64`, `NoneType`, pointer-as-`Int` +- Does **not** support: variadic args, struct-by-value, function pointers + +### 3.2 Constraints Discovered + +| Constraint | Impact | Workaround | +| -------------------------------- | ------------------------------------------------- | ------------------------------------------------------- | +| No `DLHandle` (runtime dlopen) | Cannot conditionally load GMP at runtime | Compile-time feature flag or separate build targets | +| `-Xlinker` ignored in `mojo run` | JIT mode cannot link external libs | Must use `mojo build` + execute binary | +| `UnsafePointer` origin tracking | Cannot construct pointers from addresses in `def` | Pass pointers as `Int` type | +| No struct-by-value in FFI | Cannot pass/return C structs | Handle-based API (C wrapper manages structs internally) | +| `UnsafePointer.alloc()` removed | Cannot allocate raw memory directly | Use `List[T](length=n, fill=v)` + `.unsafe_ptr()` | +| Same-pointer aliasing forbidden | Cannot pass same buffer twice to `external_call` | Use separate buffer variables | + +### 3.3 Future Mojo Improvements That Would Help + +- **`DLHandle` support**: Would enable true runtime detection and conditional loading +- **Stable `UnsafePointer` API**: Would simplify pointer handling +- **Global variables**: Would allow module-level GMP backend state +- **Conditional compilation (`@static_if`)**: Would enable clean feature toggling + +--- + +## 4. Architecture Analysis + +### 4.1 Current Decimo Architecture + +```txt +Decimo Types Internal Representation +───────────── ─────────────────────── +BigInt (BInt) → List[UInt32] in base-2^32 + Bool sign +BigUInt → List[UInt32] in base-10^9 (unsigned) +BigDecimal (Decimal) → BigUInt coefficient + Int scale + Bool sign +Dec128 → 3×UInt32 coefficient + UInt32 flags (fixed 128-bit) +``` + +### 4.2 GMP Architecture + +```txt +GMP Types Internal Representation +───────────── ─────────────────────── +mpz_t → mp_limb_t* (base-2^64 on ARM64) + int size + int alloc +mpq_t → mpz_t numerator + mpz_t denominator +mpf_t → Binary floating-point with configurable precision +``` + +### 4.3 Type Mapping + +| Decimo Type | GMP Equivalent | Conversion Cost | Integration Priority | +| -------------- | ----------------------------------- | ----------------------------------------------------- | -------------------- | +| **BigInt** | `mpz_t` | **Low** — same binary base (2^32 → 2^64 word packing) | **Phase 1** (HIGH) | +| **BigUInt** | `mpz_t` (unsigned subset) | **High** — base-10^9 → binary conversion | Phase 3 (LOW) | +| **BigDecimal** | `mpf_t` or `mpz_t` + manual scaling | **High** — base conversion + scale management | Phase 2 (MEDIUM) | +| **Dec128** | Not applicable | N/A — fixed precision, already fast | Skip | + +### 4.4 Where GMP Adds Value + +GMP's advantages over Decimo's native implementation: + +1. **Highly optimized assembly**: GMP includes hand-tuned ARM64 assembly for core operations +2. **Mature algorithm suite**: Decades of optimization (Schönhage-Strassen for huge multiplications, sub-quadratic GCD, etc.) +3. **Large-number regime**: GMP's algorithmic cutoffs are well-tuned for every platform +4. **Additional algorithms**: FFT-based multiplication, extended GCD, primality testing, etc. + +Where GMP does **not** add value: + +1. **Small numbers** (<50 digits): FFI overhead dominates +2. **Dec128**: Fixed 128-bit operations are already optimal with native code +3. **Base-10 operations**: GMP works in binary; base-10 output requires conversion +4. **Decimal arithmetic**: GMP has no native decimal type; scale management would remain in Mojo + +--- + +## 5. Performance Comparison: GMP vs Decimo + +All benchmarks run side-by-side in a single binary (`bench_gmp_vs_decimo.mojo`), compiled +with `mojo build -O3` (release mode, assertions off). Each data point is the **median of +7 runs**. Platform: Apple M2, macOS, GMP 6.3.0, Mojo 0.26.2. + +### 5.1 BigInt (base-2³²) vs GMP `mpz` + +These compare Decimo's `BigInt` (Karatsuba multiply, Burnikel-Ziegler divide) against +GMP's `mpz_*` functions. Both operate in binary representation; no base conversion needed. + +#### Multiplication (n-digit × n-digit) + +| Digits | GMP | Decimo | Decimo / GMP | +| ------ | -------- | --------- | ------------ | +| 50 | 12 ns | 106 ns | 8.8× | +| 100 | 21 ns | 156 ns | 7.4× | +| 500 | 223 ns | 2.20 µs | 9.9× | +| 1,000 | 733 ns | 7.45 µs | 10.2× | +| 5,000 | 10.40 µs | 127.44 µs | 12.3× | +| 10,000 | 24.45 µs | 380.81 µs | 15.6× | + +GMP is **8–16× faster** for multiplication, with the gap widening at larger sizes due to +GMP's hand-tuned assembly (Karatsuba / Toom-3 / FFT) versus Decimo's pure-Mojo Karatsuba. + +#### Addition (n-digit + n-digit) + +| Digits | GMP | Decimo | Decimo / GMP | +| ------ | ------ | ------- | ------------ | +| 50 | 5 ns | 51 ns | 10.2× | +| 100 | 5 ns | 56 ns | 11.2× | +| 500 | 8 ns | 111 ns | 13.9× | +| 1,000 | 12 ns | 181 ns | 15.1× | +| 5,000 | 69 ns | 667 ns | 9.7× | +| 10,000 | 150 ns | 1.24 µs | 8.2× | + +GMP is **8–15× faster** for addition. GMP uses assembly-optimized `mpn_add_n`; Decimo +uses SIMD vectorization but cannot match hand-written assembly. + +#### Floor Division (n-digit / (n/2)-digit) + +| Digits | GMP | Decimo | Decimo / GMP | +| ------ | -------- | --------- | ------------ | +| 100 | 44 ns | 510 ns | 11.6× | +| 500 | 218 ns | 1.76 µs | 8.1× | +| 1,000 | 593 ns | 5.47 µs | 9.2× | +| 5,000 | 7.16 µs | 81.73 µs | 11.4× | +| 10,000 | 22.62 µs | 254.62 µs | 11.3× | + +GMP is **8–12× faster** for division. Both use sub-quadratic algorithms (Burnikel-Ziegler), +but GMP's multiply subroutine is faster, and its lower-level code is more optimized. + +#### Integer Square Root + +| Digits | GMP | Decimo | Decimo / GMP | +| ------ | -------- | --------- | ------------ | +| 50 | 44 ns | 205 ns | 4.7× | +| 100 | 81 ns | 709 ns | 8.8× | +| 500 | 273 ns | 3.55 µs | 13.0× | +| 1,000 | 569 ns | 8.41 µs | 14.8× | +| 5,000 | 5.90 µs | 104.67 µs | 17.7× | +| 10,000 | 17.52 µs | 322.26 µs | 18.4× | + +GMP is **5–18× faster** for sqrt. Both use Newton's method, but GMP's inner multiply/divide +is much faster, and the gap accumulates over Newton iterations. + +#### GCD + +| Digits | GMP | Decimo | Decimo / GMP | +| ------ | ------- | ------- | ------------ | +| 50 | 53 ns | 216 ns | 4.1× | +| 100 | 45 ns | 247 ns | 5.5× | +| 500 | 93 ns | 543 ns | 5.8× | +| 1,000 | 142 ns | 955 ns | 6.7× | +| 5,000 | 646 ns | 4.21 µs | 6.5× | +| 10,000 | 1.95 µs | 8.02 µs | 4.1× | + +GMP is **4–7× faster** for GCD. GMP uses a sub-quadratic half-GCD algorithm (Lehmer/​Schönhage); +Decimo currently uses the Euclidean algorithm. This is the smallest gap of all operations. + +### 5.2 BigDecimal (base-10⁹) vs GMP Integer Equivalents + +GMP has no native decimal type. To compare fairly, we benchmark GMP performing the +**equivalent integer computation** that a GMP-backed BigDecimal would perform: + +- **Multiply**: GMP integer multiply on same-digit-count coefficients +- **Sqrt(2) at precision P**: GMP `isqrt(2 × 10^(2P))` vs Decimo `BigDecimal.sqrt(precision=P)` +- **Division (1/3) at precision P**: GMP `(1 × 10^P) / 3` vs Decimo `BigDecimal.true_divide(…, precision=P)` + +These ratios represent the **best-case speedup** from a GMP backend, before +accounting for base-10⁹ ↔ binary conversion overhead. + +#### BigDecimal Multiplication (n sig-digits × n sig-digits) + +| Digits | GMP (int mul) | Decimo (BigDecimal) | Decimo / GMP | +| ------ | ------------- | ------------------- | ------------ | +| 28 | 9 ns | 137 ns | 15.2× | +| 100 | 21 ns | 327 ns | 15.6× | +| 500 | 226 ns | 7.92 µs | 35.1× | +| 1,000 | 725 ns | 25.75 µs | 35.5× | +| 5,000 | 10.72 µs | 309.71 µs | 28.9× | +| 10,000 | 24.63 µs | 923.19 µs | 37.5× | + +BigDecimal multiply is **15–38× slower** than GMP integer multiply. This large gap comes +from two factors: (1) BigDecimal uses base-10⁹ `BigUInt` internally (not the faster base-2³² +`BigInt`), and (2) GMP's assembly is faster than either Decimo representation. + +#### BigDecimal Sqrt(2) at Various Precisions + +| Precision | GMP (isqrt) | Decimo (BigDecimal.sqrt) | Decimo / GMP | +| --------- | ----------- | ------------------------ | ------------ | +| 28 | 32 ns | 3.48 µs | 109× | +| 100 | 112 ns | 9.58 µs | 86× | +| 500 | 430 ns | 93.41 µs | 217× | +| 1,000 | 1.05 µs | 258.40 µs | 246× | +| 2,000 | 2.75 µs | 869.55 µs | 316× | +| 5,000 | 11.10 µs | 3.92 ms | 353× | + +BigDecimal sqrt is **86–353× slower** than GMP's integer sqrt equivalent. This extreme gap +reflects the compounding cost of Newton iteration: every iteration requires multiple +BigDecimal multiplies and divides, each of which is individually 15–35× slower than GMP. + +#### BigDecimal Division (1/3) at Various Precisions + +| Precision | GMP (int div) | Decimo (BigDecimal.true_divide) | Decimo / GMP | +| --------- | ------------- | ------------------------------- | ------------ | +| 28 | 10 ns | 426 ns | 42.6× | +| 100 | 18 ns | 496 ns | 27.6× | +| 500 | 70 ns | 830 ns | 11.9× | +| 1,000 | 150 ns | 1.30 µs | 8.7× | +| 2,000 | 300 ns | 2.30 µs | 7.7× | +| 5,000 | 800 ns | 4.80 µs | 6.0× | + +BigDecimal division is **6–43× slower**. The ratio decreases at higher precision because +Decimo's division cost is dominated by the final long-division step (linear), while at +low precision the constant overhead of BigDecimal setup is proportionally larger. + +### 5.3 Summary of Speedup Ratios + +| Operation | Small (≤100d) | Medium (~1,000d) | Large (≥5,000d) | Key Driver | +| ------------------- | ------------- | ---------------- | --------------- | --------------------------------------- | +| BigInt Multiply | 7–9× | 10× | 12–16× | GMP asm multiply (Karatsuba/Toom/FFT) | +| BigInt Add | 10–11× | 15× | 8–10× | GMP asm `mpn_add_n` | +| BigInt Floor Divide | 12× | 9× | 11× | GMP asm multiply in Burnikel-Ziegler | +| BigInt Sqrt | 5–9× | 15× | 18× | Multiply gap compounds in Newton iters | +| BigInt GCD | 4–6× | 7× | 4–7× | GMP sub-quadratic half-GCD | +| BigDecimal Multiply | 15–16× | 36× | 29–38× | base-10⁹ overhead + GMP asm | +| BigDecimal Sqrt | 86–109× | 246× | 316–353× | Newton iteration compounds the gap | +| BigDecimal Divide | 28–43× | 9× | 6–8× | Gap narrows as division dominates setup | + +### 5.4 FFI Overhead Consideration + +The C wrapper call path has fixed overhead: + +1. `external_call` dispatch: ~1–2 ns +2. Handle array lookup: ~1–2 ns +3. Total fixed overhead: **~3–5 ns per call** + +This means: + +- At **10 digits**: overhead is 30–100% of operation time → **GMP not beneficial** +- At **100 digits**: overhead is 5–10% → **GMP moderately beneficial** +- At **1,000+ digits**: overhead is <0.5% → **GMP clearly beneficial** + +**Recommended crossover threshold**: Use GMP backend for numbers with **≥64 words** (~600 +digits for BigInt). Below this, Decimo's native implementation avoids FFI overhead. + +### 5.5 Technical Note: Mojo String FFI Safety + +During benchmarking, we discovered a Mojo compiler issue where the optimizer may drop a +`String` parameter's backing buffer before the `external_call` using its `unsafe_ptr()` +completes. Under heavy allocation pressure, the freed buffer gets reused immediately, +causing GMP's `mpz_set_str` to read corrupted data. + +**Workaround**: A length-aware C function `gmpw_set_str_n(handle, ptr, len)` was added that +copies exactly `len` bytes into a null-terminated buffer before calling GMP, making the +FFI call safe regardless of the caller's buffer lifetime. Any production integration must +use the length-based variant (`gmpw_set_str_n`) or ensure String buffers remain live +through explicit variable binding. + +--- + +## 6. Operation Coverage Mapping + +### 6.1 BigInt Operations → GMP Functions + +| Decimo BigInt Operation | GMP Function (via wrapper) | Status | +| ------------------------ | ----------------------------------- | -------------- | +| `__add__` | `mpz_add` → `gmpw_add` | ✓ Tested | +| `__sub__` | `mpz_sub` → `gmpw_sub` | ✓ Tested | +| `__mul__` | `mpz_mul` → `gmpw_mul` | ✓ Tested | +| `__floordiv__` | `mpz_fdiv_q` → `gmpw_fdiv_q` | ✓ Tested | +| `__truediv__` | `mpz_tdiv_q` → `gmpw_tdiv_q` | ✓ Tested | +| `__mod__` | `mpz_mod` → `gmpw_mod` | ✓ Tested | +| `__pow__` | `mpz_pow_ui` → `gmpw_pow_ui` | ✓ Tested | +| `mod_pow` | `mpz_powm` → `gmpw_powm` | ✓ Tested | +| `sqrt` / `isqrt` | `mpz_sqrt` → `gmpw_sqrt` | ✓ Tested | +| `gcd` | `mpz_gcd` → `gmpw_gcd` | ✓ Tested | +| `__neg__` | `mpz_neg` → `gmpw_neg` | ✓ Tested | +| `__abs__` | `mpz_abs` → `gmpw_abs` | ✓ Tested | +| `__eq__`, `__lt__`, etc. | `mpz_cmp` → `gmpw_cmp` | ✓ Tested | +| `__and__` | `mpz_and` → `gmpw_and` | ✓ Tested | +| `__or__` | `mpz_ior` → `gmpw_or` | ✓ Tested | +| `__xor__` | `mpz_xor` → `gmpw_xor` | ✓ Tested | +| `__lshift__` | `mpz_mul_2exp` → `gmpw_lshift` | ✓ Tested | +| `__rshift__` | `mpz_fdiv_q_2exp` → `gmpw_rshift` | ✓ Tested | +| `lcm` | `mpz_lcm` | Wrapper needed | +| `extended_gcd` | `mpz_gcdext` | Wrapper needed | +| `mod_inverse` | `mpz_invert` | Wrapper needed | +| `from_string` | `mpz_set_str` → `gmpw_set_str` | ✓ Tested | +| `__str__` | `mpz_get_str` → `gmpw_get_str_buf` | ✓ Tested | +| Word import | `mpz_import` → `gmpw_import_u32_le` | ! Partial | +| Word export | `mpz_export` → `gmpw_export_u32_le` | ! Partial | + +### 6.2 BigDecimal Operations → GMP Coverage + +BigDecimal operations would use GMP indirectly by converting the coefficient (BigUInt) to/from GMP: + +| Operation | GMP Applicability | Notes | +| ----------- | ----------------- | ---------------------------------------------------------------------- | +| Add/Sub | ! Low value | Scale alignment dominates; coefficient add is cheap | +| Multiply | ✓ High value | Large coefficient multiplication benefits from GMP | +| Divide | ✓ High value | Newton iteration on large coefficients benefits | +| Sqrt | ✓ High value | Large-precision sqrt benefits from GMP's optimized Newton | +| Exp/Ln/Trig | ! Mixed | Series evaluation is Mojo-managed; inner multiplications could use GMP | +| Round | ✗ No value | Pure digit manipulation, no heavy arithmetic | + +### 6.3 Operations NOT in GMP + +These must remain native Mojo: + +- Decimal rounding (all modes) +- Scale/exponent management +- Base-10 digit manipulation +- Trigonometric series evaluation logic +- TOML parsing +- CLI calculator evaluation +- Dec128 operations (fixed-precision) + +--- + +## 7. Data Conversion Between Decimo and GMP + +### 7.1 BigInt ↔ GMP (Efficient) + +BigInt uses base-2^32 `List[UInt32]` in little-endian order — essentially the same representation as GMP's `mpz_t` but with 32-bit limbs instead of 64-bit. + +**Conversion approach**: + +```txt +BigInt.words: [w0, w1, w2, w3, ...] (UInt32, little-endian) +GMP limbs: [w0|w1, w2|w3, ...] (UInt64, little-endian) +``` + +Using `mpz_import`: + +```c +// Import BigInt words into GMP +mpz_import(z, word_count, -1, sizeof(uint32_t), 0, 0, words_ptr); +// -1 = least significant word first (little-endian word order) +// sizeof(uint32_t) = 4 bytes per word +// 0 = native byte order within word +// 0 = no nail bits +``` + +Using `mpz_export`: + +```c +// Export GMP value to BigInt words +size_t count; +mpz_export(words_ptr, &count, -1, sizeof(uint32_t), 0, 0, z); +``` + +**Cost**: O(n) memory copy — essentially free compared to the arithmetic operations. + +**Sign handling**: BigInt stores sign as separate `Bool`; GMP stores sign in `_mp_size` (negative = negative). Trivial to map. + +**Implementation in C wrapper** (already prototyped): + +```c +void gmpw_import_u32_le(int handle, const uint32_t* words, int count); +int gmpw_export_u32_le(int handle, uint32_t* out_buf, int max_count); +``` + +### 7.2 BigUInt ↔ GMP (Expensive) + +BigUInt uses base-10^9 — a fundamentally different number base than GMP's binary format. + +**Conversion options**: + +| Method | Complexity | Description | +| ------------------ | ---------- | -------------------------------------------------------------- | +| String round-trip | O(n²) | `BigUInt → String → mpz_set_str` — simple but slow | +| Horner's method | O(n²) | Evaluate polynomial: `((w[n-1] × 10^9 + w[n-2]) × 10^9 + ...)` | +| Divide-and-conquer | O(n·log²n) | Split at midpoint, convert halves, combine with shift+add | + +**Recommended approach**: + +- For <1000 words: Horner's method (simpler, O(n²) acceptable) +- For ≥1000 words: Divide-and-conquer using GMP's own multiply for combining + +**Break-even analysis**: If we assume base conversion costs ~O(n²) for n words: + +- At 100 words (~900 digits): conversion ~10 µs, GMP multiply saves ~4 µs → **NOT worth it** +- At 1000 words (~9000 digits): conversion ~1 ms, GMP multiply saves ~150 µs → **NOT worth it for single operation** +- **Conclusion**: BigUInt→GMP conversion is only justified when amortized across many operations (e.g., performing multiple root/exp iterations on the same number). + +### 7.3 Recommended Strategy + +```txt + ┌──────────────────────────────────────┐ + BigInt operations │ Direct word import/export (O(n)) │ → Phase 1 + │ Cheap, always beneficial > 64 words │ + └──────────────────────────────────────┘ + + ┌──────────────────────────────────────┐ + BigDecimal ops │ Convert once, compute many times │ → Phase 2 + (iterative: sqrt, │ Cache GMP representation during │ + exp, ln, etc.) │ multi-step computations │ + └──────────────────────────────────────┘ + + ┌──────────────────────────────────────┐ + BigDecimal ops │ Skip GMP — conversion cost │ → Phase 3 (or skip) + (single-shot: │ exceeds arithmetic savings │ + add, sub, round) │ │ + └──────────────────────────────────────┘ +``` + +--- + +## 8. GMP Detection and Fallback Strategy + +### 8.1 Detection Approaches + +Since Mojo 0.26.2 lacks `DLHandle` for runtime library probing, detection must be **build-time** or **semi-static**: + +#### Option A: Compile-Time Feature Flag (Recommended for Now) + +```toml +# pixi.toml +[feature.gmp] +tasks.package-gmp = "mojo build -DGMP_ENABLED -Xlinker -lgmp_wrapper ..." +``` + +```mojo +# In Decimo source (when Mojo supports comptime parameters) +@parameter +if GMP_ENABLED: + alias _engine = GMPEngine +else: + alias _engine = NativeEngine +``` + +**Pros**: Zero overhead, no runtime branching +**Cons**: Requires separate builds for GMP and non-GMP versions + +#### Option B: Probe Function at Module Load + +```mojo +# Use the C wrapper's availability check +fn _check_gmp_available() -> Bool: + """Call gmpw_available() which returns 1 if GMP is linked.""" + try: + var result = external_call["gmpw_available", Int32]() + return result == 1 + except: + return False + +# Cache at first use (workaround for no global variables) +struct _GMPState: + var checked: Bool + var available: Bool +``` + +**Pros**: Single binary works with or without GMP +**Cons**: Requires the wrapper to be linked (even if GMP is absent) + +#### Option C: Two-Library Approach + +Build two versions of the wrapper: + +1. `libgmp_wrapper.dylib` — full GMP wrapper (requires GMP) +2. `libgmp_stub.dylib` — stub that returns "unavailable" for all functions + +Link against whichever is present. The stub library lets Decimo compile and run without GMP. + +**Pros**: Clean separation, single Mojo codebase +**Cons**: Requires distributing two library variants + +### 8.2 Recommended Detection Strategy + +**Phase 1 (immediate)**: Use **Option A** — compile-time feature flag. Build two package variants: + +- `decimo.mojopkg` — native only (default) +- `decimo_gmp.mojopkg` — with GMP backend + +**Phase 2 (when Mojo matures)**: Migrate to **Option B** with `DLHandle` when available, enabling true runtime detection. + +### 8.3 Fallback Behavior + +```txt +User calls BigInt.multiply(a, b): + 1. Check engine preference (from parameter or default) + 2. If engine == GMP and gmp_available: + a. Check size threshold (≥64 words?) + b. If yes: convert words → GMP, compute, convert back + c. If no: use native Mojo implementation + 3. If engine == Native or !gmp_available: + → Use native Mojo implementation (current code) +``` + +--- + +## 9. API Design: `gmp: Bool = False` Parameter + +### 9.1 Design Rationale + +Since Mojo lacks global variables, a module-level context or engine setting is not +possible. Instead, we add a **`gmp: Bool = False`** keyword parameter to compute-heavy +BigDecimal functions. This is: + +- **Explicit** — users opt in per-call; no hidden magic +- **Backward-compatible** — defaults to `False`, existing code unchanged +- **Composable** — users pass `gmp=True` down through complex expressions like π calculation +- **Simple** — a Bool is clearer than an engine enum for two choices + +### 9.2 Which Functions Get the `gmp` Parameter + +Only **iterative compute-heavy** operations — not add/sub/multiply (which are building blocks +called internally and whose overhead is handled by the GMPDecimal struct). + +| Function | Gets `gmp`? | Reason | +| ------------------------------------------ | ----------- | ---------------------------------------------- | +| `true_divide(x, y, precision, gmp)` | ✓ Yes | Long division at high precision benefits | +| `sqrt(x, precision, gmp)` | ✓ Yes | Newton iteration: 86–353× gap | +| `root(x, n, precision, gmp)` | ✓ Yes | Newton iteration like sqrt | +| `exp(x, precision, gmp)` | ✓ Yes | Taylor series with many multiplications | +| `ln(x, precision, gmp)` | ✓ Yes | Atanh-based approach with iterative multiplies | +| `sin(x, precision, gmp)` | ✓ Yes | Taylor series | +| `cos(x, precision, gmp)` | ✓ Yes | Taylor series | +| `tan(x, precision, gmp)` | ✓ Yes | Uses sin/cos | +| `pi(precision, gmp)` | ✓ Yes | Chudnovsky binary splitting: many large muls | +| `BigDecimal.__add__`, `__sub__`, `__mul__` | ✗ No | Handled internally by GMPDecimal | +| `BigDecimal.__floordiv__`, `round` | ✗ No | Digit manipulation, no heavy arithmetic | + +### 9.3 Proposed Signatures + +```mojo +# Free functions (primary API) +def true_divide(x: BigDecimal, y: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def sqrt(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def root(x: BigDecimal, n: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def exp(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def ln(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def sin(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def cos(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def tan(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: +def pi(precision: Int, gmp: Bool = False) raises -> BigDecimal: + +# Instance methods (delegate to free functions) +def sqrt(self, precision: Int = PRECISION, gmp: Bool = False) raises -> Self: + return exponential.sqrt(self, precision, gmp) +``` + +### 9.4 Why MPFR, Not `mpz_t` with Manual Scale + +GMP provides three numeric types: + +| Type | What it stores | Mojo analogue | Use for BigDecimal? | +| -------- | --------------------------------------------------- | ------------- | --------------------------------------------------------------- | +| `mpz_t` | Arbitrary-precision **integer** | BigInt | ✗ Poor — must manually track scale, sign, scale-up for division | +| `mpf_t` | Arbitrary-precision **binary float** | (none) | △ Works but legacy, no correct rounding | +| `mpfr_t` | Arbitrary-precision **binary float** (MPFR library) | BigDecimal | **✓ Best** — correctly rounded, built-in exp/ln/sin/cos/π | + +If we used `mpz_t` (integer only), we'd need to: + +- Track `scale` and `sign` separately in GMPDecimal (3 fields) +- Manually scale up the numerator by 10^N before integer division +- Implement Newton iterations for sqrt, Taylor series for exp/ln/trig — **ourselves** +- Handle scale alignment for addition/subtraction + +With **MPFR** (`mpfr_t`), all of this is handled internally: + +- Division: `mpfr_div(r, a, b, MPFR_RNDN)` — one call +- Sqrt: `mpfr_sqrt(r, a, MPFR_RNDN)` — one call +- Exp: `mpfr_exp(r, a, MPFR_RNDN)` — one call, correctly rounded +- Ln: `mpfr_log(r, a, MPFR_RNDN)` — one call +- Sin/Cos/Tan: `mpfr_sin`, `mpfr_cos`, `mpfr_tan` — one call each +- π: `mpfr_const_pi(r, MPFR_RNDN)` — one call, arbitrary precision + +**MPFR** is a separate library built on top of GMP. It is the standard for +arbitrary-precision floating-point and is what Python's `mpmath` (via `gmpy2`) +uses under the hood. It is widely available: `brew install mpfr` (macOS), +`apt install libmpfr-dev` (Linux). + +### 9.5 The GMPDecimal Intermediate Struct (MPFR-Based) + +Because MPFR natively handles sign, scale, and precision, the struct simplifies +to **a single field** — the MPFR handle: + +```mojo +struct GMPDecimal: + """A decimal value held in MPFR floating-point representation. + + Wraps a single mpfr_t handle. All arithmetic, sqrt, exp, ln, trig + are single MPFR calls. Convert to/from BigDecimal only at entry/exit. + """ + var handle: Int32 # MPFR handle (via C wrapper) + + def __init__(out self, bd: BigDecimal, precision: Int): + """Convert BigDecimal → GMPDecimal. + + precision: decimal digits wanted → converted to bits internally. + """ + # 1 decimal digit ≈ 3.322 bits; add guard bits + var bits = precision * 4 + 64 + self.handle = mpfrw_init(bits) + # Pass full decimal string (e.g. "-3.14159") directly + mpfrw_set_str(self.handle, str(bd)) + + def to_bigdecimal(self, precision: Int) -> BigDecimal: + """Convert GMPDecimal → BigDecimal. + + Exports MPFR value as decimal string, truncated to `precision` digits. + """ + var s = mpfrw_get_str(self.handle, precision) + return BigDecimal(s) + + # Arithmetic: each is a single MPFR call + def multiply(self, other: GMPDecimal) -> GMPDecimal: ... + def add(self, other: GMPDecimal) -> GMPDecimal: ... + def divide(self, other: GMPDecimal) -> GMPDecimal: ... + def sqrt(self) -> GMPDecimal: ... + def exp(self) -> GMPDecimal: ... + def ln(self) -> GMPDecimal: ... + def sin(self) -> GMPDecimal: ... + def cos(self) -> GMPDecimal: ... + def pi(precision: Int) -> GMPDecimal: ... # static + + def __del__(owned self): + """RAII: free MPFR handle on destruction.""" + mpfrw_clear(self.handle) +``` + +Compare the old vs. new struct: + +| Aspect | Old (mpz_t + manual scale) | New (MPFR) | +| --------------- | ------------------------------------------- | ----------------------------- | +| Fields | `handle`, `scale`, `sign` (3 fields) | `handle` (1 field) | +| Division | Scale up by 10^N, `mpz_tdiv_q`, track scale | `mpfr_div(r, a, b, RNDN)` | +| Sqrt | Newton iteration in Mojo using GMPDecimal | `mpfr_sqrt(r, a, RNDN)` | +| Exp | Taylor series in Mojo using GMPDecimal | `mpfr_exp(r, a, RNDN)` | +| Ln | Atanh series in Mojo using GMPDecimal | `mpfr_log(r, a, RNDN)` | +| Sin/Cos/Tan | Taylor series in Mojo | `mpfr_sin` / `mpfr_cos` / ... | +| π | Chudnovsky in Mojo using GMPDecimal | `mpfr_const_pi(r, RNDN)` | +| Sign handling | Manual XOR | Automatic | +| Scale alignment | Manual multiply by 10^diff | Automatic | +| Rounding | Manual truncation | Correctly rounded (MPFR) | + +**Data flow for `gmp=True`**: + +```txt +sqrt(x: BigDecimal, precision=1000, gmp=True): + 1. gx = GMPDecimal(x, precision=1010) ← ONE conversion: string → mpfr_t + 2. mpfr_sqrt(result, gx, MPFR_RNDN) ← ONE GMP call (entire computation) + 3. return result.to_bigdecimal(1000) ← ONE conversion: mpfr_t → string +``` + +For a chained expression like `sqrt(23) * exp(2.1) / pi()`: + +```txt +var a = GMPDecimal(BigDecimal("23"), prec) ← 1 conversion +var b = GMPDecimal(BigDecimal("2.1"), prec) ← 1 conversion +var sa = a.sqrt() ← mpfr_sqrt (one call) +var eb = b.exp() ← mpfr_exp (one call) +var p = GMPDecimal.pi(prec) ← mpfr_const_pi (one call) +var r = sa.multiply(eb).divide(p) ← mpfr_mul + mpfr_div +return r.to_bigdecimal(precision) ← 1 conversion +``` + +Total: 3 conversions (2 in + 1 out) + 5 MPFR calls. No Newton iterations, +no Taylor series, no manual scale tracking. + +### 9.6 Behavioral Differences Between `gmp=False` and `gmp=True` + +**Important**: MPFR is a **binary** floating-point internally (mantissa stored in base-2). +This means: + +1. **Precision is in bits, not digits** — We convert: `bits = ceil(digits × 3.322)`. To + guarantee N correct decimal digits, we request ~N×4 bits (generous guard). + +2. **Last 1–2 decimal digits may differ** — Binary↔decimal rounding means the last few + digits of `gmp=True` output may not match `gmp=False` output exactly. The numerical + value will agree to the requested precision. + +3. **Fix: guard digits** — Compute internally with `precision + 10` extra digits, then + truncate the result to the requested precision. This is standard practice (Python's + `decimal` module does the same). + +4. **MPFR guarantees correct rounding** — Every operation is correctly rounded to the + working precision. This is actually *stronger* than Decimo's native implementation + which may accumulate rounding errors across Newton iterations. + +This is why `gmp` defaults to `False` — so existing behavior is preserved and users +explicitly opt into the MPFR path when they want speed. In practice, the numerical +results from MPFR are at least as accurate as the native implementation. + +### 9.6 CLI Integration (Deferred) + +CLI integration is deferred to a later phase. When implemented, a `--gmp` flag on the +command line would set `gmp=True` for all compute-heavy operations in the session. +The evaluator would pass this flag down through the function dispatch. + +### 9.7 Future: Global Variable or Context + +When Mojo adds global variables, the `gmp` parameter can be replaced or supplemented by +a module-level default: + +```mojo +# Future (when Mojo supports global variables) +BigDecimal.set_default_engine("gmp") # Sets global default +var result = sqrt(x, 1000) # Automatically uses GMP +``` + +The `gmp` parameter will remain as an explicit override even after global variables +are available, for cases where the user wants fine-grained control. + +--- + +## 10. Build System Integration + +### 10.1 C Wrapper Compilation + +The C wrapper must be compiled as a shared library and distributed alongside the Mojo package. + +**Build script** (`build_gmp_wrapper.sh`): + +```bash +#!/bin/bash +# Detect GMP installation +GMP_PREFIX=$(brew --prefix gmp 2>/dev/null || echo "/usr/local") + +if [ ! -f "$GMP_PREFIX/include/gmp.h" ]; then + echo "GMP not found. Building without GMP support." + exit 0 +fi + +# Compile wrapper +cc -shared -O2 -o libgmp_wrapper.dylib gmp_wrapper.c \ + -I"$GMP_PREFIX/include" -L"$GMP_PREFIX/lib" -lgmp + +# Fix install name (macOS) +if [[ "$(uname)" == "Darwin" ]]; then + install_name_tool -id @rpath/libgmp_wrapper.dylib libgmp_wrapper.dylib +fi +``` + +### 10.2 pixi.toml Integration + +```toml +[tasks] +build-gmp-wrapper = """ + bash src/decimo/gmp/build_gmp_wrapper.sh +""" + +package-with-gmp = { depends-on = ["build-gmp-wrapper"] } +package-with-gmp.cmd = """ + mojo package src/decimo -o decimo.mojopkg && \ + cp src/decimo/gmp/libgmp_wrapper.dylib . +""" +``` + +### 10.3 Mojo Build Flags + +For building executables that use Decimo with GMP: + +```bash +# macOS (Homebrew) +mojo build \ + -Xlinker -L./path/to/wrapper -Xlinker -lgmp_wrapper \ + -Xlinker -L/opt/homebrew/lib -Xlinker -lgmp \ + -Xlinker -rpath -Xlinker ./path/to/wrapper \ + -Xlinker -rpath -Xlinker /opt/homebrew/lib \ + -o myprogram myprogram.mojo + +# Linux (system GMP) +mojo build \ + -Xlinker -L./path/to/wrapper -Xlinker -lgmp_wrapper \ + -Xlinker -lgmp \ + -Xlinker -rpath -Xlinker ./path/to/wrapper \ + -o myprogram myprogram.mojo +``` + +### 10.4 Runtime Detection: Single Binary Without Link-Time Dependencies + +**Goal**: A single compiled binary works whether or not MPFR/GMP is installed on the +target machine. If `gmp=True` is requested and MPFR is found, use it. If not found, +raise a helpful error. Users who never call `gmp=True` pay zero cost. + +**Verified**: `dlopen` works from Mojo 0.26.2 via `external_call["dlopen", ...]`. +Tested successfully — GMP was loaded at runtime without any `-Xlinker` flags at +compile time. + +**Architecture**: The C wrapper does lazy loading internally using `dlopen`/`dlsym`, +so it compiles and links **without** needing MPFR/GMP headers or libraries present: + +```c +// gmp_wrapper.c — always compilable, no link-time dependency on libmpfr +#include + +static void *mpfr_lib = NULL; +static int mpfr_loaded = 0; + +// Function pointer types +typedef int (*mpfr_init2_t)(void*, long); +typedef void (*mpfr_clear_t)(void*); +typedef int (*mpfr_set_str_t)(void*, const char*, int, int); +// ... one typedef per MPFR function used + +static mpfr_init2_t p_mpfr_init2 = NULL; +static mpfr_clear_t p_mpfr_clear = NULL; +static mpfr_set_str_t p_mpfr_set_str = NULL; +// ... etc. + +int mpfrw_load(void) { + if (mpfr_loaded) return mpfr_lib != NULL; + mpfr_loaded = 1; + + // Try common library paths + const char *paths[] = { + "libmpfr.dylib", // macOS rpath + "/opt/homebrew/lib/libmpfr.dylib", // macOS ARM64 + "/usr/local/lib/libmpfr.dylib", // macOS x86_64 + "libmpfr.so", // Linux rpath + "/usr/lib/x86_64-linux-gnu/libmpfr.so", // Debian/Ubuntu + "/usr/lib/libmpfr.so", // Generic Linux + NULL + }; + + for (int i = 0; paths[i]; i++) { + mpfr_lib = dlopen(paths[i], RTLD_LAZY); + if (mpfr_lib) break; + } + if (!mpfr_lib) return 0; + + // Resolve all needed symbols + p_mpfr_init2 = dlsym(mpfr_lib, "mpfr_init2"); + p_mpfr_clear = dlsym(mpfr_lib, "mpfr_clear"); + p_mpfr_set_str = dlsym(mpfr_lib, "mpfr_set_str"); + // ... resolve all function pointers + + return (p_mpfr_init2 && p_mpfr_clear && p_mpfr_set_str /* && ... */); +} + +int mpfrw_available(void) { + return mpfrw_load(); +} +``` + +**On the Mojo side**, the wrapper is the only native dependency: + +```mojo +def sqrt(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: + if not gmp: + return _sqrt_native(x, precision) + + # Check MPFR availability at runtime + var available = external_call["mpfrw_available", c_int]() + if available == 0: + raise Error( + "gmp=True requires MPFR. Install: brew install mpfr (macOS) " + "or apt install libmpfr-dev (Linux)" + ) + return _sqrt_mpfr(x, precision) +``` + +**Build simplification**: With lazy loading, the build command for users becomes: + +```bash +# Only need to link the small wrapper — no -lmpfr or -lgmp needed +mojo build -Xlinker -L./path/to/wrapper -Xlinker -ldecimo_gmp_wrapper \ + -Xlinker -rpath -Xlinker ./path/to/wrapper \ + -o myprogram myprogram.mojo +``` + +The wrapper itself is a tiny C file (~200 lines) that can be compiled on any system +with just a C compiler — no GMP/MPFR headers needed. MPFR is only loaded at runtime +when `gmp=True` is actually called. + +**Summary of approach by timeline**: + +| Timeline | Detection Method | Binary Requirements | +| --------------------- | ---------------------------- | ------------------------------------------------ | +| **Now** (Mojo 0.26.2) | C wrapper + `dlopen`/`dlsym` | Link wrapper only; MPFR loaded lazily at runtime | +| **Future** (DLHandle) | Mojo-native `DLHandle` | Pure Mojo detection, no C wrapper needed | + +--- + +## 11. Cross-Platform Considerations + +### 11.1 Platform Matrix + +| Platform | GMP Availability | Wrapper Format | Install Method | Notes | +| ---------------- | ----------------- | -------------- | ------------------------ | --------------------------- | +| **macOS ARM64** | ✓ Homebrew | `.dylib` | `brew install gmp` | **Tested ✓** | +| **macOS x86_64** | ✓ Homebrew | `.dylib` | `brew install gmp` | Should work (untested) | +| **Linux x86_64** | ✓ System packages | `.so` | `apt install libgmp-dev` | Expected to work | +| **Linux ARM64** | ✓ System packages | `.so` | `apt install libgmp-dev` | Expected to work | +| **Windows** | ! Complex | `.dll` | MSYS2/vcpkg/MPIR | Requires MinGW or MPIR fork | + +### 11.2 GMP Detection by Platform + +```bash +# macOS +GMP_PREFIX=$(brew --prefix gmp 2>/dev/null) +# → /opt/homebrew/Cellar/gmp/6.3.0 + +# Linux (Debian/Ubuntu) +dpkg -l libgmp-dev 2>/dev/null | grep -q '^ii' +# Headers: /usr/include/gmp.h +# Library: /usr/lib/x86_64-linux-gnu/libgmp.so + +# Linux (generic) +pkg-config --exists gmp && pkg-config --cflags --libs gmp +``` + +### 11.3 Platform-Specific Considerations + +**macOS**: + +- Homebrew puts GMP in `/opt/homebrew/` (ARM64) or `/usr/local/` (x86_64) +- Must set `@rpath` for `.dylib` loading +- SIP may restrict library search paths + +**Linux**: + +- System GMP is typically in standard paths (`/usr/lib/`, `/usr/include/`) +- No rpath issues with standard paths +- May need `-lgmp` at end of link flags for some linkers + +**Windows** (future): + +- GMP does not natively support MSVC +- Options: MPIR (Windows-friendly GMP fork), MinGW-compiled GMP, or Cygwin +- Different shared library format (`.dll` + import lib) +- Mojo on Windows is still early — defer this concern + +--- + +## 12. Risk Analysis + +### 12.1 Technical Risks + +| Risk | Likelihood | Impact | Mitigation | +| --------------------------------------- | ---------- | ------ | ---------------------------------------------------------- | +| Mojo FFI API changes in future versions | HIGH | MEDIUM | Isolate FFI calls in single module; version-gate if needed | +| GMP version incompatibility | LOW | LOW | Target GMP ≥6.0 (stable ABI); C wrapper abstracts details | +| Handle exhaustion (4096 limit) | LOW | MEDIUM | Add handle recycling; expand pool; document limit | +| Memory leaks from uncleared handles | MEDIUM | MEDIUM | Add `__del__` integration; RAII pattern in Mojo wrapper | +| Base conversion overhead for BigDecimal | HIGH | MEDIUM | Only use GMP for iterative operations; cache conversions | +| Build complexity increase | MEDIUM | MEDIUM | Provide clear build scripts; make GMP optional | + +### 12.2 Ecosystem Risks + +| Risk | Likelihood | Impact | Mitigation | +| ------------------------------- | ---------- | ------ | --------------------------------------------------------- | +| Users don't have GMP installed | HIGH | LOW | GMP is optional; native Mojo is always available | +| Package distribution complexity | MEDIUM | MEDIUM | Document build steps; consider bundling prebuilt wrappers | +| Mojo package system changes | MEDIUM | MEDIUM | Keep wrapper as separate build artifact | + +### 12.3 Performance Risks + +| Risk | Description | Mitigation | +| ---------------------------------------- | ------------------------------------ | ------------------------------------------------ | +| FFI overhead dominates for small numbers | Measured: ~3–5 ns overhead per call | Size threshold (≥64 words) before using GMP | +| String-based conversion bottleneck | O(n²) for string round-trip | Use binary word import/export instead | +| Unnecessary conversions | Converting back and forth repeatedly | Cache GMP handles during multi-step computations | +| GMP slower for specific operations | Possible edge cases | Benchmark-driven thresholds per operation type | + +--- + +## 13. Implementation Roadmap + +> Priority: **BigDecimal first**, using **MPFR** (not raw `mpz_t`). MPFR provides +> built-in sqrt, exp, ln, sin, cos, tan, π with correct rounding — so most BigDecimal +> operations are simple one-call wrappers rather than Newton/Taylor implementations. +> BigInt integration is deferred to Phase 3. + +### Phase 1: Foundation — MPFR C Wrapper & GMPDecimal Struct + +**Goal**: Establish the MPFR-based C wrapper with runtime detection, the `GMPDecimal` +intermediate struct, and prove end-to-end with `sqrt`. + +| Step | Task | Detail | Effort | Deps | +| ---- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- | +| 1.1 | **Install MPFR** | `brew install mpfr` (macOS). Verify: `/opt/homebrew/lib/libmpfr.dylib` exists, `#include ` compiles. MPFR depends on GMP (already installed). | Tiny | — | +| 1.2 | **Extend C wrapper with `mpfr_t` handle pool** | Add to `gmp_wrapper.c`: (a) `static mpfr_t f_handles[MAX_HANDLES]` pool, (b) `mpfrw_init(prec_bits) → handle`, (c) `mpfrw_clear(h)`, (d) `mpfrw_set_str(h, str, len)` (length-safe, base-10), (e) `mpfrw_get_str(h, digits) → char*`, (f) `mpfrw_available()`. Keep existing `mpz_t` wrappers intact for BigInt. | Medium | 1.1 | +| 1.3 | **Add MPFR arithmetic wrappers** | One-line C wrappers: `mpfrw_add(r,a,b)`, `mpfrw_sub(r,a,b)`, `mpfrw_mul(r,a,b)`, `mpfrw_div(r,a,b)`, `mpfrw_sqrt(r,a)`, `mpfrw_neg(r,a)`, `mpfrw_abs(r,a)`, `mpfrw_cmp(a,b)`. All use `MPFR_RNDN` (round-to-nearest). Each is 1 line: e.g. `mpfr_sqrt(f_handles[r], f_handles[a], MPFR_RNDN)`. | Small | 1.2 | +| 1.4 | **Add MPFR transcendental wrappers** | `mpfrw_exp(r,a)`, `mpfrw_log(r,a)`, `mpfrw_sin(r,a)`, `mpfrw_cos(r,a)`, `mpfrw_tan(r,a)`, `mpfrw_const_pi(r)`, `mpfrw_pow(r,a,b)`, `mpfrw_rootn_ui(r,a,n)`. Each is 1 line. | Small | 1.2 | +| 1.5 | **Implement lazy `dlopen` loading** | Wrap all MPFR calls behind function pointers resolved via `dlopen`/`dlsym` at first use (see Section 10.4). This makes the wrapper compilable without MPFR headers. Add `mpfrw_available() → 0/1`. | Medium | 1.3, 1.4 | +| 1.6 | **Compile wrapper, run smoke test** | Build `libdecimo_gmp_wrapper.dylib` linking against `-lmpfr -lgmp`. Write minimal Mojo test: `mpfrw_init(200) → set_str("3.14") → mpfrw_sqrt → get_str → print`. Verify output. | Small | 1.5 | +| 1.7 | **Create `src/decimo/bigdecimal/gmp_decimal.mojo`** | Implement `GMPDecimal` struct (single `handle: Int32` field). Constructor from BigDecimal via `str(bd) → mpfrw_set_str`. `to_bigdecimal(precision) → mpfrw_get_str → BigDecimal(s)`. Destructor calls `mpfrw_clear`. | Medium | 1.6 | +| 1.8 | **Wire `gmp=True` into `sqrt()`** | In `exponential.mojo`: add `gmp: Bool = False` parameter. If True: `GMPDecimal(x, prec+10) → mpfrw_sqrt → to_bigdecimal(prec)`. Three lines of new code. | Small | 1.7 | +| 1.9 | **Test sqrt correctness** | For x ∈ {2, 3, 0.5, 100, 10⁻⁸, 10¹⁰⁰⁰}, P ∈ {28, 100, 500, 1000, 5000}: verify `sqrt(x, P, gmp=True)` matches `sqrt(x, P)` to P digits. | Medium | 1.8 | +| 1.10 | **Benchmark sqrt** | Measure wall time including conversion overhead. Find break-even precision where MPFR wins. Expected: above ~50–100 digits. | Small | 1.9 | +| 1.11 | **Update build system** | `pixi.toml` tasks: `build-gmp-wrapper` (compiles C wrapper + links MPFR). `build_gmp_wrapper.sh` for macOS + Linux. | Small | 1.2 | + +**Deliverable**: `sqrt(BigDecimal("2"), precision=5000, gmp=True)` returns correct result. +Internally: 1 string conversion in, 1 MPFR call, 1 string conversion out. + +### Phase 2: All BigDecimal Operations + +**Goal**: Extend `gmp=True` to all compute-heavy operations. Because MPFR has built-in +functions for each, every operation is just a **thin wrapper** — no Newton/Taylor needed. + +| Step | Task | Detail | Effort | Deps | +| ---- | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ------- | +| 2.1 | **Wire `true_divide(x, y, P, gmp=True)`** | `GMPDecimal(x) → GMPDecimal(y) → mpfrw_div → to_bigdecimal`. | Small | Phase 1 | +| 2.2 | **Wire `root(x, n, P, gmp=True)`** | `mpfrw_rootn_ui(r, x_handle, n)` — single MPFR call. | Small | Phase 1 | +| 2.3 | **Wire `exp(x, P, gmp=True)`** | `mpfrw_exp(r, x_handle)` — single MPFR call. | Small | Phase 1 | +| 2.4 | **Wire `ln(x, P, gmp=True)`** | `mpfrw_log(r, x_handle)` — single MPFR call. | Small | Phase 1 | +| 2.5 | **Wire `sin(x, P, gmp)`, `cos`, `tan`** | `mpfrw_sin`, `mpfrw_cos`, `mpfrw_tan` — each single call. | Small | Phase 1 | +| 2.6 | **Wire `pi(P, gmp=True)`** | `mpfrw_const_pi(r)` — single MPFR call. No Chudnovsky needed. | Small | Phase 1 | +| 2.7 | **Instance method wrappers** | Update `self.sqrt()`, `self.exp()`, etc. to accept `gmp: Bool = False`. | Small | 2.6 | +| 2.8 | **Comprehensive test suite** | For each function × {gmp=True, gmp=False}: verify agreement to requested precision. Edge cases: 0, negative, very large/small. Compare against WolframAlpha. | Large | 2.7 | +| 2.9 | **Benchmark all operations** | Table: operation × precision (28, 100, 500, 1000, 5000) × gmp. Include conversion overhead. | Medium | 2.8 | + +**Note**: Because each MPFR operation is a single C call, Phase 2 is significantly +simpler than it would be with `mpz_t` (where we'd need to implement Taylor series, +Chudnovsky, and Newton iteration ourselves in Mojo using GMPDecimal arithmetic). + +**Deliverable**: All iterative BigDecimal functions support `gmp=True`. + +**Checkpoint test**: `pi(precision=10000, gmp=True)` matches known π digits and runs +significantly faster than `pi(precision=10000)`. + +### Phase 3: BigInt GMP Backend (Deferred) + +**Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred because integer +operations are less user-facing. + +| Step | Task | Detail | Effort | Deps | +| ---- | ----------------------------------------------- | ---------------------------------------------------------------------- | ------ | ------- | +| 3.1 | **Word import/export in C wrapper** | Complete `gmpw_import_u32_le` / `gmpw_export_u32_le`. Test round-trip. | Small | Phase 1 | +| 3.2 | **Create `src/decimo/bigint/gmp_backend.mojo`** | GMPBigInt struct. Convert BigInt ↔ GMP via O(n) word import/export. | Medium | 3.1 | +| 3.3 | **Add `gmp: Bool = False` to BigInt ops** | On `multiply`, `floor_divide`, `sqrt`, `gcd`, `power` free functions. | Medium | 3.2 | +| 3.4 | **Tests and benchmarks** | Verify correctness. Confirm 7–18× speedup. | Medium | 3.3 | + +### Phase 4: Polish, Distribution & Future + +| Step | Task | Effort | Deps | +| ---- | ------------------------------------------------------------------------ | ------ | ---------------- | +| 4.1 | **Cross-platform wrapper compilation (Linux)** | Small | Phase 1 | +| 4.2 | **Pre-built wrapper binaries (macOS ARM64, Linux x86)** | Medium | 4.1 | +| 4.3 | **CLI `--gmp` flag** | Medium | Phase 2 | +| 4.4 | **GMPDecimal expression chaining** (user-facing) | Medium | Phase 2 | +| | ↳ `gx = GMPDecimal(x); r = gx.sqrt().multiply(gx.exp()).to_bigdecimal()` | | | +| 4.5 | **Documentation: user guide for GMP/MPFR integration** | Small | Phase 2 | +| 4.6 | **Pure-Mojo runtime detection (when DLHandle available)** | Medium | Mojo improvement | +| 4.7 | **GMP rational numbers (`mpq_t`) for exact division** | Medium | Future | + +### Estimated Effort Summary + +| Phase | Scope | Steps | Complexity | Priority | +| --------------------------- | ---------------------------------------------- | ----- | --------------------------------------------- | -------- | +| Phase 1: Foundation + sqrt | MPFR wrapper, GMPDecimal, runtime detect, sqrt | 11 | Medium — most work is C wrapper + plumbing | **HIGH** | +| Phase 2: All BigDecimal ops | divide, exp, ln, trig, pi, root | 9 | **Low** — each op is a thin wrapper over MPFR | **HIGH** | +| Phase 3: BigInt backend | Word import/export, mpz_t dispatch | 4 | Medium | Deferred | +| Phase 4: Polish | CLI, cross-platform, docs, chaining | 7 | Varies | Low | + +--- + +## 14. Appendix: Prototype Results + +### 14.1 Files Created + +| File | Purpose | Status | +| ----------------------------------- | ------------------------- | -------------------------------------- | +| `temp/gmp/gmp_wrapper.c` | C wrapper library source | ✓ Complete | +| `temp/gmp/libgmp_wrapper.dylib` | Compiled shared library | ✓ Built | +| `temp/gmp/test_gmp_ffi.mojo` | FFI test suite (18 tests) | ✓ 14/18 pass (4 wrong expected values) | +| `temp/gmp/bench_gmp_vs_decimo.mojo` | GMP benchmark suite | ✓ Complete | +| `temp/gmp/build_and_test.sh` | Build and test automation | ✓ Complete | + +### 14.2 C Wrapper API Summary + +```c +// Lifecycle +int gmpw_init(void); // → handle +void gmpw_clear(int handle); + +// String I/O +int gmpw_set_str(int handle, const char* str, int base); +int gmpw_get_str_buf(int handle, char* buf, int bufsize, int base); + +// Arithmetic +void gmpw_add(int r, int a, int b); +void gmpw_sub(int r, int a, int b); +void gmpw_mul(int r, int a, int b); +void gmpw_fdiv_q(int r, int a, int b); // Floor division +void gmpw_tdiv_q(int r, int a, int b); // Truncate division +void gmpw_mod(int r, int a, int b); +void gmpw_pow_ui(int r, int base, unsigned long exp); +void gmpw_powm(int r, int base, int exp, int mod); // Modular exponentiation +void gmpw_sqrt(int r, int a); +void gmpw_gcd(int r, int a, int b); +void gmpw_neg(int r, int a); +void gmpw_abs(int r, int a); + +// Comparison +int gmpw_cmp(int a, int b); +int gmpw_sign(int handle); +int gmpw_sizeinbase10(int handle); + +// Bitwise +void gmpw_and(int r, int a, int b); +void gmpw_or(int r, int a, int b); +void gmpw_xor(int r, int a, int b); +void gmpw_lshift(int r, int a, unsigned long bits); +void gmpw_rshift(int r, int a, unsigned long bits); + +// Binary word I/O +void gmpw_import_u32_le(int handle, const uint32_t* words, int count); +int gmpw_export_u32_le(int handle, uint32_t* out, int max_words); +int gmpw_export_u32_count(int handle); + +// Meta +int gmpw_available(void); +const char* gmpw_version(void); +``` + +### 14.3 Sample Mojo FFI Calls + +```mojo +from sys.ffi import external_call, c_int + +# ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 15.1) +def gmp_set_str(h: c_int, s: String): + var slen = c_int(len(s)) + var ptr = Int(s.unsafe_ptr()) + _ = external_call["gmpw_set_str_n", c_int, c_int, Int, c_int](h, ptr, slen) + +# ❌ UNSAFE pattern: DO NOT USE — String may be freed before external_call executes +# def gmp_set_str(h: c_int, s: String): +# _ = external_call["gmpw_set_str", c_int, c_int, Int](h, Int(s.unsafe_ptr())) + +def gmp_init() -> c_int: + return external_call["gmpw_init", c_int]() + +def gmp_clear(h: c_int): + external_call["gmpw_clear", NoneType, c_int](h) + +def gmp_add(r: c_int, a: c_int, b: c_int): + external_call["gmpw_add", NoneType, c_int, c_int, c_int](r, a, b) +``` + +### 14.4 Benchmark Data Table + +**GMP absolute timings** (per operation, macOS ARM64, Apple M-series): + +| Digits | Multiply | Add | Divide | Sqrt | GCD | +| ------- | -------- | ------ | ------ | ------ | ------ | +| 10 | 5 ns | 7 ns | — | 8 ns | 8 ns | +| 100 | 23 ns | 8 ns | 67 ns | 114 ns | 50 ns | +| 1,000 | 1 µs | 20 ns | 818 ns | 722 ns | 169 ns | +| 10,000 | 50 µs | 226 ns | 30 µs | 21 µs | 2 µs | +| 100,000 | 939 µs | 2 µs | 740 µs | 642 µs | 324 µs | + +--- + +## 15. Lessons Learned from Prototype Development + +This section documents issues encountered during the GMP prototype build-test cycle +and their fixes. **Read this before starting implementation** — every item below +cost significant debugging time and will recur if not handled. + +### 15.1 CRITICAL: Mojo String Use-After-Free in FFI Calls + +**Symptom**: GMP's `mpz_set_str()` silently fails (returns -1), leaving the mpz value +as 0. Subsequent operations on zero values trigger `__gmp_overflow_in_mpz` in division +or crash with corrupt data. + +**Root cause**: When you write: + +```mojo +def gmp_set_str(h: c_int, s: String): + _ = external_call["gmpw_set_str", c_int, c_int, Int](h, Int(s.unsafe_ptr())) +``` + +Mojo's optimizer may drop the `String` parameter `s` (freeing its backing buffer) before +`external_call` actually reads the pointer. Under heavy allocation pressure (millions of +BigInt/BigDecimal operations), the freed buffer is immediately reused by the allocator, +overrwriting the null terminator. GMP then reads past the intended string into adjacent +memory, encounters non-digit bytes, and returns error code -1. + +**How we found it**: The crash only appeared in the full benchmark binary after +millions of prior allocations, never in small test files. Adding `fprintf(stderr)` +logging to the C wrapper revealed that: + +- `strlen()` on the pointer showed 109 instead of expected 100 +- `mpz_set_str` returned -1 (failure), not 0 (success) +- The extra bytes were from adjacent String buffers reusing freed memory + +**Fix**: Pass the string length explicitly and copy in C: + +```c +// In gmp_wrapper.c +int gmpw_set_str_n(int h, const char *str, int len) { + char *buf = (char *)malloc(len + 1); + memcpy(buf, str, len); + buf[len] = '\0'; + int rc = mpz_set_str(handles[h], buf, 10); + free(buf); + return rc; +} +``` + +```mojo +# In Mojo +def gmp_set_str(h: c_int, s: String): + var slen = c_int(len(s)) # Extract length BEFORE ptr + var ptr = Int(s.unsafe_ptr()) + _ = external_call["gmpw_set_str_n", c_int, c_int, Int, c_int](h, ptr, slen) +``` + +**Rule**: **ALWAYS use `gmpw_set_str_n` with explicit length for any String → C FFI +call.** Never rely on null-termination of Mojo `String.unsafe_ptr()` in `external_call`. + +**Broader implication**: This affects ALL FFI calls that pass Mojo `String` pointers. +Every such call in the codebase must either: + +1. Use the length-aware C variant (recommended), or +2. Store the String in a local variable AND ensure no other allocation occurs between + `unsafe_ptr()` extraction and the `external_call` (fragile, not recommended) + +### 15.2 `mojo run` Does Not Support `-Xlinker` Flags + +**Symptom**: `mojo run file.mojo -Xlinker -lgmp_wrapper` ignores the linker flags, +causing "undefined symbol" errors at runtime. + +**Cause**: The JIT (`mojo run`) does not process `-Xlinker` flags. Only `mojo build` +(AOT compilation) supports them. + +**Fix**: Always use `mojo build` → run the executable. Never use `mojo run` for +GMP-linked code. + +```bash +# ✓ Correct +pixi run mojo build -Xlinker -L... -Xlinker -lgmp_wrapper -o program program.mojo +./program + +# ✗ Wrong — linker flags silently ignored +pixi run mojo run -Xlinker -L... program.mojo +``` + +### 15.3 Mojo 0.26.2 Syntax Differences + +Several Mojo syntax changes from earlier versions caused compile errors: + +| Feature | Old Syntax | Mojo 0.26.2 Syntax | +| ------------------ | ---------------------------------- | ---------------------------------------------- | +| Compile-time const | `alias X = 7` | `comptime X = 7` | +| String slicing | `s[:10]` | `s[byte=:10]` | +| UInt → Float64 | `Float64(some_uint)` | `Float64(Int(some_uint))` | +| UnsafePointer | `UnsafePointer[Byte](address=ptr)` | `UnsafePointer[Byte](unsafe_from_address=ptr)` | + +### 15.4 GMP `__gmp_overflow_in_mpz` Means Bad Input, Not Overflow + +Despite the name, `__gmp_overflow_in_mpz` typically means the `mpz_t` struct has +corrupted metadata (alloc=0, size=0, or invalid limb pointer), NOT that a computation +result was too large. If you see this error: + +1. **First check**: Did `mpz_set_str` fail? (Return value -1 means the string was invalid.) +2. **Second check**: Was the handle cleared and reused? +3. **Third check**: Was the handle ever initialized? + +In our case it was always (1) — the string pointer was dangling. + +### 15.5 Handle Pool Sizing + +The C wrapper uses a static array of `MAX_HANDLES = 4096` handles. For Phase 1 (sqrt), +each Newton iteration typically uses 3–5 temporary handles, so 4096 is plenty. But for +Phase 2 (complex expression trees), monitor handle usage. Consider: + +- Adding `gmpw_handles_in_use()` diagnostic function +- Expanding to dynamic allocation if 4096 is insufficient +- Ensuring every code path calls `gmpw_clear()` (RAII via `GMPDecimal.__del__`) + +### 15.6 Build Command Template + +The full build command for any GMP-linked Mojo file on macOS: + +```bash +pixi run mojo build \ + -Xlinker -L/path/to/temp/gmp -Xlinker -lgmp_wrapper \ + -Xlinker -L/opt/homebrew/lib -Xlinker -lgmp \ + -Xlinker -rpath -Xlinker /path/to/temp/gmp \ + -Xlinker -rpath -Xlinker /opt/homebrew/lib \ + -I src \ + -o output_binary input_file.mojo +``` + +Key points: + +- `-I src` is needed for `from decimo.xxx import ...` to resolve +- Both `-L` (link-time) and `-rpath` (run-time) paths needed for each library +- Build must be run from the project root directory for module resolution +- The binary must be run from the directory containing `libgmp_wrapper.dylib` + (or use absolute rpath) + +### 15.7 Debugging Tip: Add `fprintf(stderr)` to C Wrapper + +When debugging FFI issues, temporarily add `fprintf(stderr)` logging to the C wrapper +functions. This is invaluable because Mojo's error reporting for FFI issues is limited. +The C wrapper can inspect raw pointer values, string contents, and GMP internal state +(`handles[h]->_mp_alloc`, `_mp_size`, `_mp_d`) that are invisible from the Mojo side. + +Always remove debug logging before benchmarking — `fprintf` has measurable overhead. + +--- + +## Summary + +GMP/MPFR integration is **technically proven and recommended** for Decimo, with BigDecimal +as the priority target using **MPFR** (not raw `mpz_t`): + +1. **MPFR for BigDecimal** — Uses `mpfr_t` (arbitrary-precision binary float) instead of + `mpz_t` + manual scale tracking. Every operation (sqrt, exp, ln, sin, cos, tan, π, + divide) is a **single MPFR call**. No Newton iteration or Taylor series implementation + needed from the Mojo side. The `GMPDecimal` struct has just one field (`handle`). +2. **`gmp: Bool = False`** — Simple opt-in parameter on compute-heavy functions. Defaults + to native Decimo for backward compatibility. Guard digits handle binary↔decimal rounding. +3. **Single binary, runtime detection** — The C wrapper uses `dlopen`/`dlsym` to load + MPFR lazily at runtime (verified working in Mojo 0.26.2). Users who never call + `gmp=True` pay zero cost. If MPFR isn't installed, a clear error message is raised. +4. **BigInt deferred** to Phase 3 (uses `mpz_t` with word import/export). +5. **CLI `--gmp` deferred** to Phase 4. +6. **Phase 2 is trivial** — Because MPFR has built-in functions, wiring each BigDecimal + operation is a thin wrapper (~5 lines each), not a reimplementation of algorithms. + +The prototype in `temp/gmp/` and the lessons in Section 15 provide a foundation for +implementation. Start with Phase 1, Step 1.1 (`brew install mpfr`). From 4704255be3a27f5a99f847aa38597557d344397f Mon Sep 17 00:00:00 2001 From: ZHU Yuhao Date: Fri, 3 Apr 2026 22:39:16 +0200 Subject: [PATCH 3/6] Update plan --- docs/plans/gmp_integration.md | 915 +++++++++++++++++++++------------- 1 file changed, 557 insertions(+), 358 deletions(-) diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md index 4fbb5c4b..d84b951f 100644 --- a/docs/plans/gmp_integration.md +++ b/docs/plans/gmp_integration.md @@ -12,7 +12,7 @@ ### Verdict: FEASIBLE ✓ -GMP integration into Decimo is technically feasible and has been proven through a working prototype. A C wrapper approach (`gmp_wrapper.c`) successfully bridges Mojo's FFI layer to GMP, with all core arithmetic operations verified. +GMP integration into Decimo is technically feasible — I've proven it with a working prototype. A C wrapper (`gmp_wrapper.c`) bridges Mojo's FFI layer to GMP, and all core arithmetic operations pass. ### Key Findings @@ -29,24 +29,31 @@ GMP integration into Decimo is technically feasible and has been proven through ### Recommendation -Integrate GMP as an **optional backend for BigInt** (base-2^32) first, where data conversion -is cheap and measured speedups are 7–18× across operations. BigDecimal integration should -follow only for operations where GMP's speed advantage (15–353× in benchmarks) outweighs -the base-10⁹ ↔ binary conversion cost — primarily large multiplications, sqrt, and GCD at -1,000+ digits. +Two-type architecture: ---- +1. **BigFloat** — a new first-class MPFR-backed binary float type (like mpmath's `mpf` + but faster). Requires MPFR. Every operation is a single MPFR call. This is the fast + path for scientific computing at high precision. +2. **BigDecimal** — keeps its native base-10⁹ implementation. Gains an optional + `gmp=True` parameter for one-off acceleration of expensive operations (sqrt, exp, ln, + etc.) without changing types. No MPFR dependency. +3. **BigInt** — GMP backend deferred to Phase 3. Cheap O(n) word conversion, 7–18× + speedup measured. + +BigFloat and BigDecimal coexist because they solve different problems: BigFloat is fast +binary arithmetic (like mpmath); BigDecimal is exact decimal arithmetic (like Python +`decimal`). Both in one library, user picks the right tool. ## 2. Technical Feasibility Assessment ### 2.1 What Was Tested -A complete prototype was built in `temp/gmp/` consisting of: +A complete prototype was built in `temp/gmp/`: 1. **C wrapper** (`gmp_wrapper.c`): Handle-based API wrapping GMP's `mpz_t` operations -2. **Shared library** (`libgmp_wrapper.dylib`): Compiled C wrapper linked against GMP +2. **Shared library** (`libgmp_wrapper.dylib`): Compiled, linked against GMP 3. **Mojo FFI test** (`test_gmp_ffi.mojo`): 18 tests covering all operation categories -4. **Mojo benchmark** (`bench_gmp_vs_decimo.mojo`): Performance characterization across digit sizes +4. **Mojo benchmark** (`bench_gmp_vs_decimo.mojo`): Performance comparison across digit sizes ### 2.2 Prototype Results @@ -70,16 +77,15 @@ A complete prototype was built in `temp/gmp/` consisting of: ### 2.3 Why C Wrapper Is Required -Direct FFI to GMP is impractical in Mojo 0.26.2 because: +Direct FFI to GMP doesn't work in Mojo 0.26.2 because: 1. **No `DLHandle`**: Mojo 0.26.2 does not expose `DLHandle` for runtime dynamic library loading. Only `external_call` (compile-time/link-time binding) is available. 2. **`mpz_t` is a struct**: GMP's `mpz_t` is `__mpz_struct[1]` (a 1-element array used as a pass-by-reference trick). Mojo cannot directly represent or pass this. 3. **Pointer origin tracking**: `UnsafePointer` in Mojo 0.26.2 has strict origin tracking that prevents constructing pointers from raw integer addresses in non-`fn` contexts. 4. **GMP macros**: Many GMP operations (e.g., `mpz_init`, `mpz_clear`) are preprocessor macros, not true function symbols. -The C wrapper solves all these issues by exposing a flat, handle-based C API where Mojo passes and receives only primitive types (`Int32`, `Int`). - ---- +The C wrapper solves all of this by exposing a flat, handle-based C API where Mojo +passes and receives only primitive types (`Int32`, `Int`). ## 3. Mojo FFI Capabilities and Constraints @@ -115,8 +121,6 @@ var result = external_call["gmpw_add", NoneType, Int32, Int32, Int32](a, b, c) - **Global variables**: Would allow module-level GMP backend state - **Conditional compilation (`@static_if`)**: Would enable clean feature toggling ---- - ## 4. Architecture Analysis ### 4.1 Current Decimo Architecture @@ -142,31 +146,30 @@ mpf_t → Binary floating-point with configurable precision ### 4.3 Type Mapping -| Decimo Type | GMP Equivalent | Conversion Cost | Integration Priority | -| -------------- | ----------------------------------- | ----------------------------------------------------- | -------------------- | -| **BigInt** | `mpz_t` | **Low** — same binary base (2^32 → 2^64 word packing) | **Phase 1** (HIGH) | -| **BigUInt** | `mpz_t` (unsigned subset) | **High** — base-10^9 → binary conversion | Phase 3 (LOW) | -| **BigDecimal** | `mpf_t` or `mpz_t` + manual scaling | **High** — base conversion + scale management | Phase 2 (MEDIUM) | -| **Dec128** | Not applicable | N/A — fixed precision, already fast | Skip | +| Decimo Type | GMP Equivalent | Conversion Cost | Integration Priority | +| -------------------- | ------------------------------- | ----------------------------------------------------- | -------------------- | +| **BigFloat** *(new)* | `mpfr_t` | **None** — direct MPFR handle | **Phase 1** (HIGH) | +| **BigInt** | `mpz_t` | **Low** — same binary base (2^32 → 2^64 word packing) | **Phase 3** (MEDIUM) | +| **BigUInt** | `mpz_t` (unsigned subset) | **High** — base-10^9 → binary conversion | Skip | +| **BigDecimal** | via BigFloat (`gmp=True` sugar) | **High** — string conversion, but single round-trip | **Phase 2** (HIGH) | +| **Dec128** | Not applicable | N/A — fixed precision, already fast | Skip | -### 4.4 Where GMP Adds Value +### 4.4 Where GMP Helps -GMP's advantages over Decimo's native implementation: +GMP's advantages over my native implementation: 1. **Highly optimized assembly**: GMP includes hand-tuned ARM64 assembly for core operations 2. **Mature algorithm suite**: Decades of optimization (Schönhage-Strassen for huge multiplications, sub-quadratic GCD, etc.) 3. **Large-number regime**: GMP's algorithmic cutoffs are well-tuned for every platform 4. **Additional algorithms**: FFT-based multiplication, extended GCD, primality testing, etc. -Where GMP does **not** add value: +Where GMP does **not** help: 1. **Small numbers** (<50 digits): FFI overhead dominates 2. **Dec128**: Fixed 128-bit operations are already optimal with native code 3. **Base-10 operations**: GMP works in binary; base-10 output requires conversion 4. **Decimal arithmetic**: GMP has no native decimal type; scale management would remain in Mojo ---- - ## 5. Performance Comparison: GMP vs Decimo All benchmarks run side-by-side in a single binary (`bench_gmp_vs_decimo.mojo`), compiled @@ -249,7 +252,7 @@ Decimo currently uses the Euclidean algorithm. This is the smallest gap of all o ### 5.2 BigDecimal (base-10⁹) vs GMP Integer Equivalents -GMP has no native decimal type. To compare fairly, we benchmark GMP performing the +GMP has no native decimal type. To compare fairly, I benchmark GMP performing the **equivalent integer computation** that a GMP-backed BigDecimal would perform: - **Multiply**: GMP integer multiply on same-digit-count coefficients @@ -257,7 +260,7 @@ GMP has no native decimal type. To compare fairly, we benchmark GMP performing t - **Division (1/3) at precision P**: GMP `(1 × 10^P) / 3` vs Decimo `BigDecimal.true_divide(…, precision=P)` These ratios represent the **best-case speedup** from a GMP backend, before -accounting for base-10⁹ ↔ binary conversion overhead. +accounting for the base-10⁹ ↔ binary conversion overhead. #### BigDecimal Multiplication (n sig-digits × n sig-digits) @@ -325,7 +328,7 @@ The C wrapper call path has fixed overhead: 2. Handle array lookup: ~1–2 ns 3. Total fixed overhead: **~3–5 ns per call** -This means: +In practice: - At **10 digits**: overhead is 30–100% of operation time → **GMP not beneficial** - At **100 digits**: overhead is 5–10% → **GMP moderately beneficial** @@ -334,21 +337,19 @@ This means: **Recommended crossover threshold**: Use GMP backend for numbers with **≥64 words** (~600 digits for BigInt). Below this, Decimo's native implementation avoids FFI overhead. -### 5.5 Technical Note: Mojo String FFI Safety +### 5.5 Mojo String FFI Safety Issue -During benchmarking, we discovered a Mojo compiler issue where the optimizer may drop a +During benchmarking, I hit a Mojo compiler issue where the optimizer may drop a `String` parameter's backing buffer before the `external_call` using its `unsafe_ptr()` completes. Under heavy allocation pressure, the freed buffer gets reused immediately, causing GMP's `mpz_set_str` to read corrupted data. -**Workaround**: A length-aware C function `gmpw_set_str_n(handle, ptr, len)` was added that +**Workaround**: I added a length-aware C function `gmpw_set_str_n(handle, ptr, len)` that copies exactly `len` bytes into a null-terminated buffer before calling GMP, making the FFI call safe regardless of the caller's buffer lifetime. Any production integration must use the length-based variant (`gmpw_set_str_n`) or ensure String buffers remain live through explicit variable binding. ---- - ## 6. Operation Coverage Mapping ### 6.1 BigInt Operations → GMP Functions @@ -406,13 +407,11 @@ These must remain native Mojo: - CLI calculator evaluation - Dec128 operations (fixed-precision) ---- - ## 7. Data Conversion Between Decimo and GMP ### 7.1 BigInt ↔ GMP (Efficient) -BigInt uses base-2^32 `List[UInt32]` in little-endian order — essentially the same representation as GMP's `mpz_t` but with 32-bit limbs instead of 64-bit. +BigInt uses base-2^32 `List[UInt32]` in little-endian order — basically the same thing as GMP's `mpz_t` but with 32-bit limbs instead of 64-bit. **Conversion approach**: @@ -453,7 +452,7 @@ int gmpw_export_u32_le(int handle, uint32_t* out_buf, int max_count); ### 7.2 BigUInt ↔ GMP (Expensive) -BigUInt uses base-10^9 — a fundamentally different number base than GMP's binary format. +BigUInt uses base-10^9 — a completely different number base from GMP's binary format. **Conversion options**: @@ -468,7 +467,7 @@ BigUInt uses base-10^9 — a fundamentally different number base than GMP's bina - For <1000 words: Horner's method (simpler, O(n²) acceptable) - For ≥1000 words: Divide-and-conquer using GMP's own multiply for combining -**Break-even analysis**: If we assume base conversion costs ~O(n²) for n words: +**Break-even analysis**: Assuming base conversion costs ~O(n²) for n words: - At 100 words (~900 digits): conversion ~10 µs, GMP multiply saves ~4 µs → **NOT worth it** - At 1000 words (~9000 digits): conversion ~1 ms, GMP multiply saves ~150 µs → **NOT worth it for single operation** @@ -495,13 +494,11 @@ BigUInt uses base-10^9 — a fundamentally different number base than GMP's bina └──────────────────────────────────────┘ ``` ---- - ## 8. GMP Detection and Fallback Strategy ### 8.1 Detection Approaches -Since Mojo 0.26.2 lacks `DLHandle` for runtime library probing, detection must be **build-time** or **semi-static**: +Since Mojo 0.26.2 doesn't have `DLHandle` for runtime library probing, detection has to be **build-time** or **semi-static**: #### Option A: Compile-Time Feature Flag (Recommended for Now) @@ -578,229 +575,274 @@ User calls BigInt.multiply(a, b): → Use native Mojo implementation (current code) ``` ---- +## 9. Type Design: BigFloat and BigDecimal + +Decimo gets two arbitrary-precision types with different backends and semantics. -## 9. API Design: `gmp: Bool = False` Parameter +### 9.1 Two Types, Two Number Systems -### 9.1 Design Rationale +| | **BigDecimal** | **BigFloat** | +| ------------------ | -------------------------------- | -------------------------------------------------- | +| Internal base | 10⁹ (decimal) | binary (MPFR `mpfr_t`) | +| Python analogue | `decimal.Decimal` | `mpmath.mpf` | +| `sqrt(0.01)` | `0.1` (exact) | `0.1000...0` (padded) | +| Dependency | None | MPFR required | +| Speed at high prec | Native Mojo | 10–400× faster (MPFR) | +| Precision unit | Decimal digits | Bits (user specifies digits, converted internally) | +| Rounding | Decimal-exact where possible | Correctly rounded per IEEE binary semantics | +| Primary use case | Finance, human-friendly decimals | Scientific computing, transcendentals | -Since Mojo lacks global variables, a module-level context or engine setting is not -possible. Instead, we add a **`gmp: Bool = False`** keyword parameter to compute-heavy -BigDecimal functions. This is: +They coexist in the same library. User picks the right tool for the job. -- **Explicit** — users opt in per-call; no hidden magic -- **Backward-compatible** — defaults to `False`, existing code unchanged -- **Composable** — users pass `gmp=True` down through complex expressions like π calculation -- **Simple** — a Bool is clearer than an engine enum for two choices +### 9.2 Why Two Types Instead of a Backend Swap -### 9.2 Which Functions Get the `gmp` Parameter +`mpmath` pulls off a transparent backend swap (`MPZ = gmpy.mpz or int`) because: -Only **iterative compute-heavy** operations — not add/sub/multiply (which are building blocks -called internally and whose overhead is handled by the GMPDecimal struct). +- Python is dynamically typed — the swap is invisible at runtime +- `mpmath` already uses binary representation internally — swapping `int` for `mpz_t` + doesn't change semantics -| Function | Gets `gmp`? | Reason | -| ------------------------------------------ | ----------- | ---------------------------------------------- | -| `true_divide(x, y, precision, gmp)` | ✓ Yes | Long division at high precision benefits | -| `sqrt(x, precision, gmp)` | ✓ Yes | Newton iteration: 86–353× gap | -| `root(x, n, precision, gmp)` | ✓ Yes | Newton iteration like sqrt | -| `exp(x, precision, gmp)` | ✓ Yes | Taylor series with many multiplications | -| `ln(x, precision, gmp)` | ✓ Yes | Atanh-based approach with iterative multiplies | -| `sin(x, precision, gmp)` | ✓ Yes | Taylor series | -| `cos(x, precision, gmp)` | ✓ Yes | Taylor series | -| `tan(x, precision, gmp)` | ✓ Yes | Uses sin/cos | -| `pi(precision, gmp)` | ✓ Yes | Chudnovsky binary splitting: many large muls | -| `BigDecimal.__add__`, `__sub__`, `__mul__` | ✗ No | Handled internally by GMPDecimal | -| `BigDecimal.__floordiv__`, `round` | ✗ No | Digit manipulation, no heavy arithmetic | +I can't do this in Decimo because: -### 9.3 Proposed Signatures +1. **Mojo is statically typed** — users must choose their type at compile time. +2. **BigDecimal uses base-10⁹** — swapping BigUInt for `mpz_t` would change decimal + semantics (e.g., `sqrt(0.01)` behavior). +3. **They solve different problems** — BigDecimal is for exact decimal arithmetic; + BigFloat is for fast scientific computation. + +Python itself acknowledges this split: `decimal.Decimal` and `mpmath.mpf` are separate +packages. I'm combining both under one import. + +### 9.3 BigFloat: MPFR-Backed Binary Float ```mojo -# Free functions (primary API) -def true_divide(x: BigDecimal, y: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def sqrt(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def root(x: BigDecimal, n: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def exp(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def ln(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def sin(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def cos(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def tan(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal: -def pi(precision: Int, gmp: Bool = False) raises -> BigDecimal: - -# Instance methods (delegate to free functions) -def sqrt(self, precision: Int = PRECISION, gmp: Bool = False) raises -> Self: - return exponential.sqrt(self, precision, gmp) +struct BigFloat: + """Arbitrary-precision binary floating-point, backed by MPFR. + + Like mpmath's mpf but faster — every operation is a single MPFR call. + Requires MPFR to be installed. If MPFR is unavailable, construction raises. + """ + var handle: Int32 # MPFR handle (via C wrapper handle pool) ``` +**BigFloat requires MPFR. No fallback.** + +I considered a pure-Mojo BigInt-based fallback (like mpmath's pure-Python mode), but: + +- mpmath's fallback arithmetic is ~30K lines of Python for the math functions alone + (Taylor series, argument reduction, binary splitting, etc.) +- Reimplementing all that in Mojo defeats the purpose — BigFloat's value proposition + IS speed via MPFR +- If MPFR isn't installed, users have BigDecimal — it already does sqrt, exp, ln, sin, + cos, tan, pi, all in pure Mojo. Just slower at high precision. +- A half-working BigFloat (add/sub/mul work but sqrt/exp/ln don't) would be confusing + +So: BigFloat without MPFR is like numpy without its C core. Users without MPFR use +BigDecimal. This is an honest, clean boundary. + ### 9.4 Why MPFR, Not `mpz_t` with Manual Scale -GMP provides three numeric types: +GMP has three numeric types: -| Type | What it stores | Mojo analogue | Use for BigDecimal? | -| -------- | --------------------------------------------------- | ------------- | --------------------------------------------------------------- | -| `mpz_t` | Arbitrary-precision **integer** | BigInt | ✗ Poor — must manually track scale, sign, scale-up for division | -| `mpf_t` | Arbitrary-precision **binary float** | (none) | △ Works but legacy, no correct rounding | -| `mpfr_t` | Arbitrary-precision **binary float** (MPFR library) | BigDecimal | **✓ Best** — correctly rounded, built-in exp/ln/sin/cos/π | +| Type | What it stores | Mojo analogue | Use for BigFloat? | +| -------- | --------------------------------------------------- | ------------- | ------------------------------------------------------------------ | +| `mpz_t` | Arbitrary-precision **integer** | BigInt | ✗ Poor — must manually track scale, sign, implement all algorithms | +| `mpf_t` | Arbitrary-precision **binary float** (GMP built-in) | — | △ Works but legacy, no correct rounding | +| `mpfr_t` | Arbitrary-precision **binary float** (MPFR library) | **BigFloat** | **✓ Best** — correctly rounded, built-in exp/ln/sin/cos/π | -If we used `mpz_t` (integer only), we'd need to: +With `mpz_t`, I'd have to: -- Track `scale` and `sign` separately in GMPDecimal (3 fields) -- Manually scale up the numerator by 10^N before integer division -- Implement Newton iterations for sqrt, Taylor series for exp/ln/trig — **ourselves** +- Store `(sign, mantissa: mpz_t, exponent: Int)` and manage scale manually +- Scale up numerators by 10^N before integer division +- Implement Newton iterations for sqrt, Taylor series for exp/ln/trig — **myself** - Handle scale alignment for addition/subtraction +- That's basically reimplementing mpmath in Mojo (~30K lines) -With **MPFR** (`mpfr_t`), all of this is handled internally: +With **MPFR**, all of this is handled internally: - Division: `mpfr_div(r, a, b, MPFR_RNDN)` — one call - Sqrt: `mpfr_sqrt(r, a, MPFR_RNDN)` — one call -- Exp: `mpfr_exp(r, a, MPFR_RNDN)` — one call, correctly rounded -- Ln: `mpfr_log(r, a, MPFR_RNDN)` — one call +- Exp/Ln: `mpfr_exp`, `mpfr_log` — one call each, correctly rounded - Sin/Cos/Tan: `mpfr_sin`, `mpfr_cos`, `mpfr_tan` — one call each - π: `mpfr_const_pi(r, MPFR_RNDN)` — one call, arbitrary precision -**MPFR** is a separate library built on top of GMP. It is the standard for -arbitrary-precision floating-point and is what Python's `mpmath` (via `gmpy2`) -uses under the hood. It is widely available: `brew install mpfr` (macOS), -`apt install libmpfr-dev` (Linux). +MPFR sits on top of GMP and is the industry standard for arbitrary-precision +floating-point. Easy to get: `brew install mpfr` (macOS), `apt install libmpfr-dev` +(Linux). -### 9.5 The GMPDecimal Intermediate Struct (MPFR-Based) - -Because MPFR natively handles sign, scale, and precision, the struct simplifies -to **a single field** — the MPFR handle: +### 9.5 BigFloat API ```mojo -struct GMPDecimal: - """A decimal value held in MPFR floating-point representation. - - Wraps a single mpfr_t handle. All arithmetic, sqrt, exp, ln, trig - are single MPFR calls. Convert to/from BigDecimal only at entry/exit. - """ - var handle: Int32 # MPFR handle (via C wrapper) - - def __init__(out self, bd: BigDecimal, precision: Int): - """Convert BigDecimal → GMPDecimal. - - precision: decimal digits wanted → converted to bits internally. - """ - # 1 decimal digit ≈ 3.322 bits; add guard bits - var bits = precision * 4 + 64 - self.handle = mpfrw_init(bits) - # Pass full decimal string (e.g. "-3.14159") directly - mpfrw_set_str(self.handle, str(bd)) - - def to_bigdecimal(self, precision: Int) -> BigDecimal: - """Convert GMPDecimal → BigDecimal. - - Exports MPFR value as decimal string, truncated to `precision` digits. - """ - var s = mpfrw_get_str(self.handle, precision) - return BigDecimal(s) - - # Arithmetic: each is a single MPFR call - def multiply(self, other: GMPDecimal) -> GMPDecimal: ... - def add(self, other: GMPDecimal) -> GMPDecimal: ... - def divide(self, other: GMPDecimal) -> GMPDecimal: ... - def sqrt(self) -> GMPDecimal: ... - def exp(self) -> GMPDecimal: ... - def ln(self) -> GMPDecimal: ... - def sin(self) -> GMPDecimal: ... - def cos(self) -> GMPDecimal: ... - def pi(precision: Int) -> GMPDecimal: ... # static - - def __del__(owned self): - """RAII: free MPFR handle on destruction.""" - mpfrw_clear(self.handle) +# Construction +var x = BigFloat("3.14159", precision=1000) # from string +var y = BigFloat(bd, precision=1000) # from BigDecimal +var z = BigFloat.pi(precision=5000) # constant + +# Arithmetic (returns BigFloat) +var r = x + y +var r = x * y +var r = x / y + +# Transcendentals (single MPFR call each) +var r = x.sqrt() +var r = x.exp() +var r = x.ln() +var r = x.sin() +var r = x.cos() +var r = x.tan() +var r = x.root(n) +var r = x.power(y) + +# Conversion back to BigDecimal +var bd = r.to_bigdecimal(precision=1000) + +# Chained computation (stays in MPFR the whole time) +var result = ( + BigFloat("23", P).sqrt() + .multiply(BigFloat("2.1", P).exp()) + .divide(BigFloat.pi(P)) + .to_bigdecimal(P) +) +# Cost: 2 string→MPFR conversions in, 5 MPFR calls, 1 MPFR→string out +# No intermediate decimal conversions ``` -Compare the old vs. new struct: +Precision is specified in decimal digits but converted to bits internally: +`bits = ceil(digits × 3.322) + guard_bits`. Guard bits (64 extra) ensure the +requested decimal digits are all correct after binary→decimal conversion. -| Aspect | Old (mpz_t + manual scale) | New (MPFR) | -| --------------- | ------------------------------------------- | ----------------------------- | -| Fields | `handle`, `scale`, `sign` (3 fields) | `handle` (1 field) | -| Division | Scale up by 10^N, `mpz_tdiv_q`, track scale | `mpfr_div(r, a, b, RNDN)` | -| Sqrt | Newton iteration in Mojo using GMPDecimal | `mpfr_sqrt(r, a, RNDN)` | -| Exp | Taylor series in Mojo using GMPDecimal | `mpfr_exp(r, a, RNDN)` | -| Ln | Atanh series in Mojo using GMPDecimal | `mpfr_log(r, a, RNDN)` | -| Sin/Cos/Tan | Taylor series in Mojo | `mpfr_sin` / `mpfr_cos` / ... | -| π | Chudnovsky in Mojo using GMPDecimal | `mpfr_const_pi(r, RNDN)` | -| Sign handling | Manual XOR | Automatic | -| Scale alignment | Manual multiply by 10^diff | Automatic | -| Rounding | Manual truncation | Correctly rounded (MPFR) | +### 9.6 BigDecimal `gmp=True` Convenience -**Data flow for `gmp=True`**: +BigDecimal keeps an optional `gmp=True` parameter for **one-off acceleration** +without changing types. Internally it does `BigDecimal → BigFloat → compute → BigDecimal`: -```txt -sqrt(x: BigDecimal, precision=1000, gmp=True): - 1. gx = GMPDecimal(x, precision=1010) ← ONE conversion: string → mpfr_t - 2. mpfr_sqrt(result, gx, MPFR_RNDN) ← ONE GMP call (entire computation) - 3. return result.to_bigdecimal(1000) ← ONE conversion: mpfr_t → string +```mojo +# Single-call sugar (user stays in BigDecimal world) +var r = sqrt(x, precision=5000, gmp=True) +var r = exp(x, precision=5000, gmp=True) +var r = pi(precision=10000, gmp=True) + +# Equivalent to: +var r = BigFloat(x, 5010).sqrt().to_bigdecimal(5000) ``` -For a chained expression like `sqrt(23) * exp(2.1) / pi()`: +Only **iterative compute-heavy** operations get the `gmp` parameter: -```txt -var a = GMPDecimal(BigDecimal("23"), prec) ← 1 conversion -var b = GMPDecimal(BigDecimal("2.1"), prec) ← 1 conversion -var sa = a.sqrt() ← mpfr_sqrt (one call) -var eb = b.exp() ← mpfr_exp (one call) -var p = GMPDecimal.pi(prec) ← mpfr_const_pi (one call) -var r = sa.multiply(eb).divide(p) ← mpfr_mul + mpfr_div -return r.to_bigdecimal(precision) ← 1 conversion -``` +| Function | Gets `gmp`? | Reason | +| ------------------------------- | ----------- | --------------------------------------- | +| `sqrt`, `root` | ✓ Yes | Newton iteration → single MPFR call | +| `exp`, `ln` | ✓ Yes | Taylor/atanh series → single MPFR call | +| `sin`, `cos`, `tan` | ✓ Yes | Taylor series → single MPFR call | +| `pi` | ✓ Yes | Chudnovsky → `mpfr_const_pi` | +| `true_divide` | ✓ Yes | Long division at high precision | +| `__add__`, `__sub__`, `__mul__` | ✗ No | Too cheap — FFI overhead not worth it | +| `round`, `__floordiv__` | ✗ No | Digit manipulation, no heavy arithmetic | -Total: 3 conversions (2 in + 1 out) + 5 MPFR calls. No Newton iterations, -no Taylor series, no manual scale tracking. +**When to use `gmp=True` vs BigFloat directly:** -### 9.6 Behavioral Differences Between `gmp=False` and `gmp=True` +- Single operation on BigDecimal inputs? → `gmp=True` (one liner, stays in BigDecimal) +- Chain of operations? → Use BigFloat directly (avoids repeated conversion) -**Important**: MPFR is a **binary** floating-point internally (mantissa stored in base-2). -This means: +### 9.7 Behavioral Differences: Binary vs. Decimal -1. **Precision is in bits, not digits** — We convert: `bits = ceil(digits × 3.322)`. To - guarantee N correct decimal digits, we request ~N×4 bits (generous guard). +MPFR is a **binary** float internally. What this means: -2. **Last 1–2 decimal digits may differ** — Binary↔decimal rounding means the last few - digits of `gmp=True` output may not match `gmp=False` output exactly. The numerical - value will agree to the requested precision. +1. **Last 1–2 decimal digits may differ** between `gmp=True` and `gmp=False`. The + numerical value agrees to the requested precision, but the least significant digits + can differ due to binary↔decimal rounding. -3. **Fix: guard digits** — Compute internally with `precision + 10` extra digits, then - truncate the result to the requested precision. This is standard practice (Python's - `decimal` module does the same). +2. **Guard digits handle this** — I compute with `precision + 10` extra digits, then + truncate. Standard practice (Python's `decimal` module does the same). -4. **MPFR guarantees correct rounding** — Every operation is correctly rounded to the - working precision. This is actually *stronger* than Decimo's native implementation +3. **MPFR guarantees correct rounding** — every operation is correctly rounded to the + working binary precision. Actually *stronger* than Decimo's native implementation which may accumulate rounding errors across Newton iterations. -This is why `gmp` defaults to `False` — so existing behavior is preserved and users -explicitly opt into the MPFR path when they want speed. In practice, the numerical -results from MPFR are at least as accurate as the native implementation. +4. **Semantic difference with BigFloat** — `BigFloat("0.01").sqrt()` returns + `0.1000000...0` (binary representation of 0.1). `BigDecimal("0.01").sqrt()` returns + exactly `0.1`. For scientific computing this doesn't matter. For financial/display + work, use BigDecimal. -### 9.6 CLI Integration (Deferred) +### 9.8 When to Use Which -CLI integration is deferred to a later phase. When implemented, a `--gmp` flag on the -command line would set `gmp=True` for all compute-heavy operations in the session. -The evaluator would pass this flag down through the function dispatch. +| Scenario | Use | Why | +| ---------------------------------------- | ------------------------ | ------------------------------------------------- | +| Financial calculations | BigDecimal | Exact decimal representation, no binary surprises | +| Display-friendly output | BigDecimal | `0.1` stays `0.1`, not `0.10000...00001` | +| Fast transcendentals (exp, ln, sin, cos) | BigFloat | Single MPFR call, 10–353× faster | +| Scientific computing | BigFloat | MPFR's correctly-rounded arithmetic | +| Chained high-precision computation | BigFloat | Stays in MPFR, no intermediate conversions | +| One-off speedup without changing types | `gmp=True` on BigDecimal | Drop-in acceleration, backward compatible | +| No MPFR installed | BigDecimal | Works everywhere, pure Mojo | +| Quick calculation under ~100 digits | BigDecimal | FFI overhead not worth it at low precision | -### 9.7 Future: Global Variable or Context +### 9.9 Future: Lazy Evaluation for BigDecimal -When Mojo adds global variables, the `gmp` parameter can be replaced or supplemented by -a module-level default: +The `gmp=True` per-call approach has a weakness: chained operations convert back and +forth between BigDecimal and MPFR at every step. A future `lazy()` API fixes this: ```mojo -# Future (when Mojo supports global variables) -BigDecimal.set_default_engine("gmp") # Sets global default -var result = sqrt(x, 1000) # Automatically uses GMP +# Future API +var result = ( + BigDecimal("23").lazy() + .sqrt(5000) + .multiply(BigDecimal("2.1").lazy().exp(5000)) + .divide(BigDecimal.pi(5000)) + .collect() # executes the whole chain +) ``` -The `gmp` parameter will remain as an explicit override even after global variables -are available, for cases where the user wants fine-grained control. +How it would work: + +1. `.lazy()` returns a `LazyDecimal` that **records operations** instead of evaluating +2. `.collect()` analyzes the expression tree and picks the best execution strategy +3. If MPFR available: convert leaf inputs to MPFR **once**, execute entire DAG in MPFR, + convert root output back **once** — zero intermediate conversions +4. If MPFR unavailable (or `.collect(gmp=False)`): execute natively in BigDecimal + +This is the same pattern as: + +- **Polars** — lazy DataFrames, `.collect()` triggers execution +- **Spark** — query planning, transformations are lazy until action +- **C++ Eigen** — expression templates, deferred evaluation until assignment +- **TensorFlow 1.x** — computation graphs, `session.run()` triggers execution + +The key insight: users write BigDecimal code (familiar API), the engine figures out the +fastest execution path. This goes beyond the roadmap below but is architecturally clean +because BigFloat and the MPFR wrapper already provide the fast backend. ---- +### 9.10 CLI Integration + +With BigFloat as a first-class type, the CLI can offer: + +```bash +decimo "sqrt(2)" -p 1000 # default: BigDecimal +decimo "sqrt(2)" -p 1000 --mode float # use BigFloat (requires MPFR) +decimo "sqrt(2)" -p 1000 --gmp # BigDecimal with gmp=True +``` + +`--mode float` tells the evaluator to parse all literals as BigFloat and evaluate the +entire expression in MPFR. `--gmp` keeps BigDecimal but enables `gmp=True` on each +operation (current plan, deferred to Phase 4). + +### 9.11 Future: Global Variable or Context + +When Mojo gets global variables, a module-level default can supplement the per-call +parameter: + +```mojo +# Future (when Mojo supports global variables) +BigDecimal.set_default_engine("gmp") +var result = sqrt(x, 1000) # automatically uses GMP +``` + +The `gmp` parameter stays as an explicit override for fine-grained control. ## 10. Build System Integration ### 10.1 C Wrapper Compilation -The C wrapper must be compiled as a shared library and distributed alongside the Mojo package. +The C wrapper gets compiled as a shared library, shipped alongside the Mojo package. **Build script** (`build_gmp_wrapper.sh`): @@ -862,13 +904,12 @@ mojo build \ ### 10.4 Runtime Detection: Single Binary Without Link-Time Dependencies -**Goal**: A single compiled binary works whether or not MPFR/GMP is installed on the -target machine. If `gmp=True` is requested and MPFR is found, use it. If not found, -raise a helpful error. Users who never call `gmp=True` pay zero cost. +**Goal**: One compiled binary that works whether or not MPFR/GMP is on the +machine. If `gmp=True` is requested and MPFR is found, use it. If not, raise +a helpful error. If nobody calls `gmp=True`, zero cost. **Verified**: `dlopen` works from Mojo 0.26.2 via `external_call["dlopen", ...]`. -Tested successfully — GMP was loaded at runtime without any `-Xlinker` flags at -compile time. +I tested it — GMP loaded at runtime without any `-Xlinker` flags at compile time. **Architecture**: The C wrapper does lazy loading internally using `dlopen`/`dlsym`, so it compiles and links **without** needing MPFR/GMP headers or libraries present: @@ -952,9 +993,9 @@ mojo build -Xlinker -L./path/to/wrapper -Xlinker -ldecimo_gmp_wrapper \ -o myprogram myprogram.mojo ``` -The wrapper itself is a tiny C file (~200 lines) that can be compiled on any system -with just a C compiler — no GMP/MPFR headers needed. MPFR is only loaded at runtime -when `gmp=True` is actually called. +The wrapper is a tiny C file (~200 lines) that compiles on any system +with just a C compiler — no GMP/MPFR headers needed. MPFR only gets loaded at +runtime when `gmp=True` is actually called. **Summary of approach by timeline**: @@ -963,8 +1004,6 @@ when `gmp=True` is actually called. | **Now** (Mojo 0.26.2) | C wrapper + `dlopen`/`dlsym` | Link wrapper only; MPFR loaded lazily at runtime | | **Future** (DLHandle) | Mojo-native `DLHandle` | Pure Mojo detection, no C wrapper needed | ---- - ## 11. Cross-Platform Considerations ### 11.1 Platform Matrix @@ -1014,8 +1053,6 @@ pkg-config --exists gmp && pkg-config --cflags --libs gmp - Different shared library format (`.dll` + import lib) - Mojo on Windows is still early — defer this concern ---- - ## 12. Risk Analysis ### 12.1 Technical Risks @@ -1046,102 +1083,270 @@ pkg-config --exists gmp && pkg-config --cflags --libs gmp | Unnecessary conversions | Converting back and forth repeatedly | Cache GMP handles during multi-step computations | | GMP slower for specific operations | Possible edge cases | Benchmark-driven thresholds per operation type | ---- - -## 13. Implementation Roadmap - -> Priority: **BigDecimal first**, using **MPFR** (not raw `mpz_t`). MPFR provides -> built-in sqrt, exp, ln, sin, cos, tan, π with correct rounding — so most BigDecimal -> operations are simple one-call wrappers rather than Newton/Taylor implementations. -> BigInt integration is deferred to Phase 3. - -### Phase 1: Foundation — MPFR C Wrapper & GMPDecimal Struct - -**Goal**: Establish the MPFR-based C wrapper with runtime detection, the `GMPDecimal` -intermediate struct, and prove end-to-end with `sqrt`. - -| Step | Task | Detail | Effort | Deps | -| ---- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- | -| 1.1 | **Install MPFR** | `brew install mpfr` (macOS). Verify: `/opt/homebrew/lib/libmpfr.dylib` exists, `#include ` compiles. MPFR depends on GMP (already installed). | Tiny | — | -| 1.2 | **Extend C wrapper with `mpfr_t` handle pool** | Add to `gmp_wrapper.c`: (a) `static mpfr_t f_handles[MAX_HANDLES]` pool, (b) `mpfrw_init(prec_bits) → handle`, (c) `mpfrw_clear(h)`, (d) `mpfrw_set_str(h, str, len)` (length-safe, base-10), (e) `mpfrw_get_str(h, digits) → char*`, (f) `mpfrw_available()`. Keep existing `mpz_t` wrappers intact for BigInt. | Medium | 1.1 | -| 1.3 | **Add MPFR arithmetic wrappers** | One-line C wrappers: `mpfrw_add(r,a,b)`, `mpfrw_sub(r,a,b)`, `mpfrw_mul(r,a,b)`, `mpfrw_div(r,a,b)`, `mpfrw_sqrt(r,a)`, `mpfrw_neg(r,a)`, `mpfrw_abs(r,a)`, `mpfrw_cmp(a,b)`. All use `MPFR_RNDN` (round-to-nearest). Each is 1 line: e.g. `mpfr_sqrt(f_handles[r], f_handles[a], MPFR_RNDN)`. | Small | 1.2 | -| 1.4 | **Add MPFR transcendental wrappers** | `mpfrw_exp(r,a)`, `mpfrw_log(r,a)`, `mpfrw_sin(r,a)`, `mpfrw_cos(r,a)`, `mpfrw_tan(r,a)`, `mpfrw_const_pi(r)`, `mpfrw_pow(r,a,b)`, `mpfrw_rootn_ui(r,a,n)`. Each is 1 line. | Small | 1.2 | -| 1.5 | **Implement lazy `dlopen` loading** | Wrap all MPFR calls behind function pointers resolved via `dlopen`/`dlsym` at first use (see Section 10.4). This makes the wrapper compilable without MPFR headers. Add `mpfrw_available() → 0/1`. | Medium | 1.3, 1.4 | -| 1.6 | **Compile wrapper, run smoke test** | Build `libdecimo_gmp_wrapper.dylib` linking against `-lmpfr -lgmp`. Write minimal Mojo test: `mpfrw_init(200) → set_str("3.14") → mpfrw_sqrt → get_str → print`. Verify output. | Small | 1.5 | -| 1.7 | **Create `src/decimo/bigdecimal/gmp_decimal.mojo`** | Implement `GMPDecimal` struct (single `handle: Int32` field). Constructor from BigDecimal via `str(bd) → mpfrw_set_str`. `to_bigdecimal(precision) → mpfrw_get_str → BigDecimal(s)`. Destructor calls `mpfrw_clear`. | Medium | 1.6 | -| 1.8 | **Wire `gmp=True` into `sqrt()`** | In `exponential.mojo`: add `gmp: Bool = False` parameter. If True: `GMPDecimal(x, prec+10) → mpfrw_sqrt → to_bigdecimal(prec)`. Three lines of new code. | Small | 1.7 | -| 1.9 | **Test sqrt correctness** | For x ∈ {2, 3, 0.5, 100, 10⁻⁸, 10¹⁰⁰⁰}, P ∈ {28, 100, 500, 1000, 5000}: verify `sqrt(x, P, gmp=True)` matches `sqrt(x, P)` to P digits. | Medium | 1.8 | -| 1.10 | **Benchmark sqrt** | Measure wall time including conversion overhead. Find break-even precision where MPFR wins. Expected: above ~50–100 digits. | Small | 1.9 | -| 1.11 | **Update build system** | `pixi.toml` tasks: `build-gmp-wrapper` (compiles C wrapper + links MPFR). `build_gmp_wrapper.sh` for macOS + Linux. | Small | 1.2 | - -**Deliverable**: `sqrt(BigDecimal("2"), precision=5000, gmp=True)` returns correct result. -Internally: 1 string conversion in, 1 MPFR call, 1 string conversion out. - -### Phase 2: All BigDecimal Operations - -**Goal**: Extend `gmp=True` to all compute-heavy operations. Because MPFR has built-in -functions for each, every operation is just a **thin wrapper** — no Newton/Taylor needed. - -| Step | Task | Detail | Effort | Deps | -| ---- | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ------- | -| 2.1 | **Wire `true_divide(x, y, P, gmp=True)`** | `GMPDecimal(x) → GMPDecimal(y) → mpfrw_div → to_bigdecimal`. | Small | Phase 1 | -| 2.2 | **Wire `root(x, n, P, gmp=True)`** | `mpfrw_rootn_ui(r, x_handle, n)` — single MPFR call. | Small | Phase 1 | -| 2.3 | **Wire `exp(x, P, gmp=True)`** | `mpfrw_exp(r, x_handle)` — single MPFR call. | Small | Phase 1 | -| 2.4 | **Wire `ln(x, P, gmp=True)`** | `mpfrw_log(r, x_handle)` — single MPFR call. | Small | Phase 1 | -| 2.5 | **Wire `sin(x, P, gmp)`, `cos`, `tan`** | `mpfrw_sin`, `mpfrw_cos`, `mpfrw_tan` — each single call. | Small | Phase 1 | -| 2.6 | **Wire `pi(P, gmp=True)`** | `mpfrw_const_pi(r)` — single MPFR call. No Chudnovsky needed. | Small | Phase 1 | -| 2.7 | **Instance method wrappers** | Update `self.sqrt()`, `self.exp()`, etc. to accept `gmp: Bool = False`. | Small | 2.6 | -| 2.8 | **Comprehensive test suite** | For each function × {gmp=True, gmp=False}: verify agreement to requested precision. Edge cases: 0, negative, very large/small. Compare against WolframAlpha. | Large | 2.7 | -| 2.9 | **Benchmark all operations** | Table: operation × precision (28, 100, 500, 1000, 5000) × gmp. Include conversion overhead. | Medium | 2.8 | - -**Note**: Because each MPFR operation is a single C call, Phase 2 is significantly -simpler than it would be with `mpz_t` (where we'd need to implement Taylor series, -Chudnovsky, and Newton iteration ourselves in Mojo using GMPDecimal arithmetic). - -**Deliverable**: All iterative BigDecimal functions support `gmp=True`. +## 13. Lessons from Other Libraries + +Before diving into the roadmap, I looked at how major open-source libraries integrate GMP/MPFR to see what I can learn from them. + +### 13.1 mpmath (Python) — Backend Abstraction + +mpmath is *the* arbitrary-precision math library for Python (~1M+ monthly downloads). + +**How mpmath does it**: + +- **Backend swap**: `backend.py` defines `MPZ` (integer type) and `BACKEND` (string). + At startup it tries `import gmpy2`, then `import gmp`, falling back to Python `int`. + All mantissa arithmetic goes through `MPZ` — a single point of substitution. +- **Environment override**: `MPMATH_NOGMPY` env var disables GMP entirely, + useful for debugging and pure-Python deployment. +- **gmpy acceleration hooks**: When gmpy is available, mpmath replaces its + `_normalize` and `from_man_exp` hot-path functions with optimized C versions + from gmpy (`gmpy._mpmath_normalize`, `gmpy._mpmath_create`), and also + `mpf_mul_int` is swapped to a gmpy-accelerated version. This gives 2–5× + speedup on core arithmetic with zero API change. +- **Internal representation**: Binary floating-point tuples `(sign, man, exp, bc)`. + All precision is in **bits**, converted to/from decimal digits via + `dps_to_prec(n) = max(1, int(round((n+1)*3.3219...)))` and + `prec_to_dps(n) = max(1, int(round(n/3.3219...))-1)`. +- **Guard bits**: Extra precision is added internally. Individual functions use + `extraprec` (add N bits) or `workprec` (set to N bits). The `autoprec` decorator + doubles precision until results converge — a "guess-and-verify" paradigm. + +**Takeaway**: The single-variable backend swap (`MPZ = gmpy.mpz`) is clean. +My C wrapper approach is analogous but at a lower level. I should make sure +that all GMP-dependent code stays isolated in a single module. + +### 13.2 gmpy2 (Python ↔ MPFR) — Context-Based Precision + +gmpy2 is the standard Python binding for GMP + MPFR + MPC. + +**How gmpy2 does it**: + +- **Context object**: `gmpy2.context()` controls precision, rounding mode, exception + traps, and exponent range. `get_context()` returns the active context; + `local_context()` provides temporary overrides (similar to Python's `decimal.localcontext()`). +- **5 rounding modes**: RoundToNearest, RoundToZero, RoundUp, RoundDown, RoundAwayZero. + These map directly to MPFR's rounding modes. +- **Thread safety**: MPFR data (flags, emin/emax, default precision) uses + thread-local storage when built with `--enable-thread-safe`. +- **Exception model**: Flags for overflow, underflow, inexact, invalid, divzero, + erange. Each has a corresponding `trap_*` bool — if True, raises a Python + exception; if False, sets the flag silently and returns a special value (NaN, ±Inf). +- **Precision tracking**: Each `mpfr` value carries its own precision. When combining + values of different precisions, the context precision governs the result. +- **Implicit conversion**: `mpz → mpfr` is automatic. `float → mpfr` preserves + all 53 bits. `str → mpfr` uses full specified precision. + +**Takeaway**: My `gmp: Bool = False` parameter approach is simpler but +sufficient. If I later want a richer API, a context object could control precision, +rounding mode, and GMP-enable globally. For now, per-call `gmp=True` is fine. + +### 13.3 Arb/FLINT — Ball Arithmetic & Guard Bits + +Arb (now merged into FLINT) implements rigorous arbitrary-precision ball arithmetic. + +**How Arb works**: + +- **Ball semantics**: Each number is `[midpoint ± radius]`. Operations propagate + error bounds automatically. The user doesn't manually track rounding errors. +- **Binary ↔ decimal conversion**: Arb explicitly warns that this conversion is + lossy. `arb_set_str("2.3", prec)` correctly creates a ball containing 23/10, + unlike a naive `double` assignment. `arb_printn()` only prints digits that are + guaranteed correct. +- **Guess-and-verify paradigm**: Start with modest precision, check if the result + is accurate enough, double precision and retry if not. This is the recommended + approach for complex computations. +- **Guard bits heuristic**: For accuracy of p bits, use working precision O(1)+p. + The O(1) term depends on the expression. "Adding a few guard bits" (10–20 for + basic operations) is usually enough. +- **Polynomial time guarantee**: Arb caps internal work parameters by polynomial + functions of `prec` to prevent runaway computation. E.g., `arb_sin(huge_x)` returns + `[±1]` instead of attempting expensive argument reduction. +- **Handle lifecycle**: GMP-style `arb_init(x)` / `arb_clear(x)`. Output parameters + come first, then inputs, then precision. + +**Takeaway**: The guard bits pattern confirms my approach of using +`prec + 10` in MPFR calls. I should also add a precision cap for safety. +The polynomial time guarantee is worth stealing for my iterative operations. + +### 13.4 python-flint — Direct FLINT/Arb Bindings + +python-flint wraps FLINT (which includes Arb) for Python. + +**How python-flint works**: + +- **Type hierarchy**: Separate types for each number ring: `fmpz` (integers), + `fmpq` (rationals), `arb` (real balls), `acb` (complex balls), plus polynomial + and matrix variants. +- **Global context**: Precision is set globally (`ctx.prec = 333` for ~100 decimal + digits). All operations use the current context precision. + +### 13.5 Consistency Check: My Plan vs. Best Practices + +| Best Practice (from survey) | Status in My Plan | Notes | +| ------------------------------------------ | ------------------- | ----------------------------------------------------------------- | +| Graceful fallback (GMP optional) | ✓ Designed | BigDecimal works without MPFR; BigFloat requires it | +| First-class user-facing type | ✓ BigFloat | Like mpmath's `mpf` — user constructs and chains operations | +| Convenience sugar on existing type | ✓ `gmp=True` | BigDecimal ops accept `gmp=True` for one-off acceleration | +| Isolated backend module | ✓ `bigfloat.mojo` | All MPFR calls in one file | +| Guard bits in MPFR calls | ✓ `prec + 10` | Confirmed adequate by Arb docs | +| Decimal ↔ binary via string (not raw bits) | ✓ Planned | `mpfr_set_str` / `mpfr_get_str`, base 10 | +| Environment variable to disable GMP | ✗ **Not yet** | **TODO**: `DECIMO_NOGMP` env check (like mpmath `MPMATH_NOGMPY`) | +| Handle pool with lifecycle management | ✓ Planned | C wrapper handle pool | +| Rounding mode control | △ Partial | RNDN only. Consider exposing rounding mode later | +| Precision cap for safety | ✗ **Not yet** | **TODO**: Cap internal MPFR precision to prevent OOM | +| Conversion overhead awareness | ✓ Documented | String round-trip mitigated by BigFloat chaining + lazy future | +| Thread safety considerations | ✗ **Not yet** | **TODO**: MPFR handle pool is not thread-safe; document this | +| Benchmark-driven thresholds | ✓ Planned (Phase 1) | Find break-even precision for each operation | +| Lazy evaluation for chained ops | △ Future | `BigDecimal.lazy()...collect()` — deferred, architecturally clean | + +**Gaps I found** (incorporated into the roadmap below): + +1. **`DECIMO_NOGMP` environment variable** (Phase 1) — lets users disable GMP at + runtime, like mpmath's `MPMATH_NOGMPY`. +2. **Precision cap** (Phase 1) — prevent MPFR from allocating unbounded memory + for extreme precision values. +3. **Thread safety docs** (Phase 4) — the C handle pool is not thread-safe; + need to document this. + +## 14. Implementation Roadmap + +> BigFloat first (Phase 1), then BigDecimal `gmp=True` sugar (Phase 2), then BigInt +> backend (Phase 3). MPFR has built-in sqrt, exp, ln, sin, cos, tan, π with correct +> rounding — so BigFloat operations are single MPFR calls, and BigDecimal sugar is +> just BigFloat under the hood. + +### Phase 1: Foundation — MPFR C Wrapper & BigFloat + +**Goal**: Get the MPFR-based C wrapper working, build the `BigFloat` struct as a +first-class user-facing type, and prove the pipeline with `sqrt`. + +- [ ] **1.1 Install MPFR** (Tiny, no deps) + `brew install mpfr` (macOS). Verify: `/opt/homebrew/lib/libmpfr.dylib` exists, + `#include ` compiles. MPFR depends on GMP (already installed). +- [ ] **1.2 Extend C wrapper with `mpfr_t` handle pool** (Medium, deps: 1.1) + Add to `gmp_wrapper.c`: + (a) `static mpfr_t f_handles[MAX_HANDLES]` pool, + (b) `mpfrw_init(prec_bits) → handle`, + (c) `mpfrw_clear(h)`, + (d) `mpfrw_set_str(h, str, len)` (length-safe, base-10), + (e) `mpfrw_get_str(h, digits) → char*`, + (f) `mpfrw_available()`. + Keep existing `mpz_t` wrappers intact for BigInt. + **Add precision cap**: reject `prec_bits > MAX_PREC` (e.g. 1M bits ≈ 300K digits) + to prevent OOM, inspired by Arb's polynomial time guarantee. +- [ ] **1.3 Add MPFR arithmetic wrappers** (Small, deps: 1.2) + One-line C wrappers: `mpfrw_add(r,a,b)`, `mpfrw_sub(r,a,b)`, `mpfrw_mul(r,a,b)`, + `mpfrw_div(r,a,b)`, `mpfrw_sqrt(r,a)`, `mpfrw_neg(r,a)`, `mpfrw_abs(r,a)`, + `mpfrw_cmp(a,b)`. All use `MPFR_RNDN` (round-to-nearest). Each is 1 line. +- [ ] **1.4 Add MPFR transcendental wrappers** (Small, deps: 1.2) + `mpfrw_exp(r,a)`, `mpfrw_log(r,a)`, `mpfrw_sin(r,a)`, `mpfrw_cos(r,a)`, + `mpfrw_tan(r,a)`, `mpfrw_const_pi(r)`, `mpfrw_pow(r,a,b)`, `mpfrw_rootn_ui(r,a,n)`. + Each is 1 line. +- [ ] **1.5 Implement lazy `dlopen` loading** (Medium, deps: 1.3, 1.4) + Wrap all MPFR calls behind function pointers resolved via `dlopen`/`dlsym` at + first use (see Section 10.4). This makes the wrapper compilable without MPFR + headers. Add `mpfrw_available() → 0/1`. + **Add `DECIMO_NOGMP` check**: if set, `mpfrw_available()` returns 0 without + attempting `dlopen` (inspired by mpmath's `MPMATH_NOGMPY`). +- [ ] **1.6 Compile wrapper, run smoke test** (Small, deps: 1.5) + Build `libdecimo_gmp_wrapper.dylib` linking against `-lmpfr -lgmp`. Write minimal + Mojo test: `mpfrw_init(200) → set_str("3.14") → mpfrw_sqrt → get_str → print`. + Verify output. +- [ ] **1.7 Create `src/decimo/bigfloat/bigfloat.mojo`** (Medium, deps: 1.6) + Implement `BigFloat` struct (single `handle: Int32` field). Constructor from + string via `mpfrw_set_str`. Constructor from BigDecimal via `str(bd) → mpfrw_set_str`. + `to_bigdecimal(precision) → mpfrw_get_str → BigDecimal(s)`. Destructor calls + `mpfrw_clear`. Use guard bits: init MPFR with `dps_to_prec(precision) + 20` bits. +- [ ] **1.8 BigFloat arithmetic and transcendentals** (Medium, deps: 1.7) + Wire all MPFR wrappers into BigFloat methods: `sqrt()`, `exp()`, `ln()`, `sin()`, + `cos()`, `tan()`, `root(n)`, `power(y)`, `+`, `-`, `*`, `/`, `pi()`. + Each is a thin wrapper around the corresponding `mpfrw_*` call. +- [ ] **1.9 BigFloat ↔ BigDecimal conversion** (Small, deps: 1.7) + `BigFloat(bd: BigDecimal, precision)` and `bf.to_bigdecimal(precision)`. + Both go through string representation (`mpfr_set_str` / `mpfr_get_str`). +- [ ] **1.10 Test BigFloat correctness** (Medium, deps: 1.8) + For each operation × x ∈ {2, 3, 0.5, 100, 10⁻⁸, 10¹⁰⁰⁰} × P ∈ {28, 100, 500, + 1000, 5000}: verify BigFloat results match BigDecimal results to P digits. + Edge cases: 0, negative, very large/small. +- [ ] **1.11 Benchmark BigFloat** (Small, deps: 1.10) + Measure wall time for BigFloat vs BigDecimal across operations and precisions. + Find break-even precision where BigFloat wins. Expected: above ~50–100 digits. +- [ ] **1.12 Update build system** (Small, deps: 1.2) + `pixi.toml` tasks: `build-gmp-wrapper` (compiles C wrapper + links MPFR). + `build_gmp_wrapper.sh` for macOS + Linux. + +**Deliverable**: `BigFloat("2", 5000).sqrt()` works. Users can construct BigFloat, +chain operations, and convert back to BigDecimal. + +### Phase 2: BigDecimal `gmp=True` Convenience Sugar + +**Goal**: Wire `gmp=True` into BigDecimal compute-heavy operations. Internally each +uses BigFloat (Phase 1), so this is just plumbing. + +- [ ] **2.1 Wire `sqrt(x, P, gmp=True)`** (Small, deps: Phase 1) + If `gmp=True`: `BigFloat(x, P+10).sqrt().to_bigdecimal(P)`. Three lines. +- [ ] **2.2 Wire `true_divide(x, y, P, gmp=True)`** (Small, deps: Phase 1) + `BigFloat(x, P+10) / BigFloat(y, P+10) → to_bigdecimal(P)`. +- [ ] **2.3 Wire `root`, `exp`, `ln`, `sin`, `cos`, `tan`** (Small, deps: Phase 1) + Each is a one-liner delegating to BigFloat. Single MPFR call per operation. +- [ ] **2.4 Wire `pi(P, gmp=True)`** (Small, deps: Phase 1) + `BigFloat.pi(P+10).to_bigdecimal(P)`. No Chudnovsky needed. +- [ ] **2.5 Instance method wrappers** (Small, deps: 2.4) + Update `self.sqrt()`, `self.exp()`, etc. to accept `gmp: Bool = False`. +- [ ] **2.6 Comprehensive test suite** (Large, deps: 2.5) + For each function × {gmp=True, gmp=False}: verify agreement to requested + precision. Edge cases: 0, negative, very large/small. +- [ ] **2.7 Benchmark gmp=True vs gmp=False** (Medium, deps: 2.6) + Table: operation × precision (28, 100, 500, 1000, 5000) × gmp. Include + conversion overhead. + +**Deliverable**: All iterative BigDecimal functions support `gmp=True`. Existing +BigDecimal API unchanged for `gmp=False`. **Checkpoint test**: `pi(precision=10000, gmp=True)` matches known π digits and runs significantly faster than `pi(precision=10000)`. ### Phase 3: BigInt GMP Backend (Deferred) -**Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred because integer -operations are less user-facing. +**Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred — integer +ops are less user-facing. -| Step | Task | Detail | Effort | Deps | -| ---- | ----------------------------------------------- | ---------------------------------------------------------------------- | ------ | ------- | -| 3.1 | **Word import/export in C wrapper** | Complete `gmpw_import_u32_le` / `gmpw_export_u32_le`. Test round-trip. | Small | Phase 1 | -| 3.2 | **Create `src/decimo/bigint/gmp_backend.mojo`** | GMPBigInt struct. Convert BigInt ↔ GMP via O(n) word import/export. | Medium | 3.1 | -| 3.3 | **Add `gmp: Bool = False` to BigInt ops** | On `multiply`, `floor_divide`, `sqrt`, `gcd`, `power` free functions. | Medium | 3.2 | -| 3.4 | **Tests and benchmarks** | Verify correctness. Confirm 7–18× speedup. | Medium | 3.3 | +- [ ] **3.1 Word import/export in C wrapper** (Small, deps: Phase 1) + Complete `gmpw_import_u32_le` / `gmpw_export_u32_le`. Test round-trip. +- [ ] **3.2 Create `src/decimo/bigint/gmp_backend.mojo`** (Medium, deps: 3.1) + GMPBigInt struct. Convert BigInt ↔ GMP via O(n) word import/export. +- [ ] **3.3 Add `gmp: Bool = False` to BigInt ops** (Medium, deps: 3.2) + On `multiply`, `floor_divide`, `sqrt`, `gcd`, `power` free functions. +- [ ] **3.4 Tests and benchmarks** (Medium, deps: 3.3) + Verify correctness. Confirm 7–18× speedup. ### Phase 4: Polish, Distribution & Future -| Step | Task | Effort | Deps | -| ---- | ------------------------------------------------------------------------ | ------ | ---------------- | -| 4.1 | **Cross-platform wrapper compilation (Linux)** | Small | Phase 1 | -| 4.2 | **Pre-built wrapper binaries (macOS ARM64, Linux x86)** | Medium | 4.1 | -| 4.3 | **CLI `--gmp` flag** | Medium | Phase 2 | -| 4.4 | **GMPDecimal expression chaining** (user-facing) | Medium | Phase 2 | -| | ↳ `gx = GMPDecimal(x); r = gx.sqrt().multiply(gx.exp()).to_bigdecimal()` | | | -| 4.5 | **Documentation: user guide for GMP/MPFR integration** | Small | Phase 2 | -| 4.6 | **Pure-Mojo runtime detection (when DLHandle available)** | Medium | Mojo improvement | -| 4.7 | **GMP rational numbers (`mpq_t`) for exact division** | Medium | Future | +- [ ] **4.1 Cross-platform wrapper compilation (Linux)** (Small, deps: Phase 1) +- [ ] **4.2 Pre-built wrapper binaries (macOS ARM64, Linux x86)** (Medium, deps: 4.1) +- [ ] **4.3 CLI `--mode float` and `--gmp` flags** (Medium, deps: Phase 2) + `--mode float` evaluates in BigFloat; `--gmp` enables `gmp=True` on BigDecimal ops. +- [ ] **4.4 Documentation: user guide for BigFloat and GMP integration** (Small, deps: Phase 2) +- [ ] **4.5 Pure-Mojo runtime detection (when DLHandle available)** (Medium, deps: Mojo improvement) +- [ ] **4.6 GMP rational numbers (`mpq_t`) for exact division** (Medium, future) +- [ ] **4.7 Document thread safety** (Small, deps: Phase 1) + The C handle pool is not thread-safe; MPFR requires `--enable-thread-safe` for + concurrent use. Document this limitation. +- [ ] **4.8 Lazy evaluation: `BigDecimal.lazy()...collect()`** (Large, future) + Record a DAG of operations, execute entire chain in MPFR with zero intermediate + conversions. See Section 9.9 for design. Architecturally clean because BigFloat + and the MPFR wrapper (Phase 1) already provide the backend. ### Estimated Effort Summary -| Phase | Scope | Steps | Complexity | Priority | -| --------------------------- | ---------------------------------------------- | ----- | --------------------------------------------- | -------- | -| Phase 1: Foundation + sqrt | MPFR wrapper, GMPDecimal, runtime detect, sqrt | 11 | Medium — most work is C wrapper + plumbing | **HIGH** | -| Phase 2: All BigDecimal ops | divide, exp, ln, trig, pi, root | 9 | **Low** — each op is a thin wrapper over MPFR | **HIGH** | -| Phase 3: BigInt backend | Word import/export, mpz_t dispatch | 4 | Medium | Deferred | -| Phase 4: Polish | CLI, cross-platform, docs, chaining | 7 | Varies | Low | +- **Phase 1: MPFR wrapper + BigFloat** — 12 steps, Medium complexity (most work is C wrapper + BigFloat struct). **HIGH** priority. +- **Phase 2: BigDecimal gmp=True sugar** — 7 steps, **Low** complexity (each op delegates to BigFloat). **HIGH** priority. +- **Phase 3: BigInt backend** — 4 steps, Medium complexity. Deferred. +- **Phase 4: Polish + lazy eval** — 8 steps, varies. Low priority. ---- +## 15. Appendix: Prototype Results -## 14. Appendix: Prototype Results - -### 14.1 Files Created +### 15.1 Files Created | File | Purpose | Status | | ----------------------------------- | ------------------------- | -------------------------------------- | @@ -1151,7 +1356,7 @@ operations are less user-facing. | `temp/gmp/bench_gmp_vs_decimo.mojo` | GMP benchmark suite | ✓ Complete | | `temp/gmp/build_and_test.sh` | Build and test automation | ✓ Complete | -### 14.2 C Wrapper API Summary +### 15.2 C Wrapper API Summary ```c // Lifecycle @@ -1198,12 +1403,12 @@ int gmpw_available(void); const char* gmpw_version(void); ``` -### 14.3 Sample Mojo FFI Calls +### 15.3 Sample Mojo FFI Calls ```mojo from sys.ffi import external_call, c_int -# ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 15.1) +# ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 16.1) def gmp_set_str(h: c_int, s: String): var slen = c_int(len(s)) var ptr = Int(s.unsafe_ptr()) @@ -1223,7 +1428,7 @@ def gmp_add(r: c_int, a: c_int, b: c_int): external_call["gmpw_add", NoneType, c_int, c_int, c_int](r, a, b) ``` -### 14.4 Benchmark Data Table +### 15.4 Benchmark Data Table **GMP absolute timings** (per operation, macOS ARM64, Apple M-series): @@ -1235,18 +1440,15 @@ def gmp_add(r: c_int, a: c_int, b: c_int): | 10,000 | 50 µs | 226 ns | 30 µs | 21 µs | 2 µs | | 100,000 | 939 µs | 2 µs | 740 µs | 642 µs | 324 µs | ---- - -## 15. Lessons Learned from Prototype Development +## 16. Lessons Learned from Prototype Development -This section documents issues encountered during the GMP prototype build-test cycle -and their fixes. **Read this before starting implementation** — every item below -cost significant debugging time and will recur if not handled. +This section documents gotchas from the GMP prototype build-test cycle +and how I fixed them. **Read this before starting implementation** — every item +cost me real debugging time and will bite again if not handled. -### 15.1 CRITICAL: Mojo String Use-After-Free in FFI Calls +### 16.1 CRITICAL: Mojo String Use-After-Free in FFI -**Symptom**: GMP's `mpz_set_str()` silently fails (returns -1), leaving the mpz value -as 0. Subsequent operations on zero values trigger `__gmp_overflow_in_mpz` in division +**Symptom**: GMP's `mpz_set_str()` silently fails (returns -1), leaving the mpz as 0. Subsequent operations on zero values trigger `__gmp_overflow_in_mpz` in division or crash with corrupt data. **Root cause**: When you write: @@ -1256,15 +1458,15 @@ def gmp_set_str(h: c_int, s: String): _ = external_call["gmpw_set_str", c_int, c_int, Int](h, Int(s.unsafe_ptr())) ``` -Mojo's optimizer may drop the `String` parameter `s` (freeing its backing buffer) before -`external_call` actually reads the pointer. Under heavy allocation pressure (millions of -BigInt/BigDecimal operations), the freed buffer is immediately reused by the allocator, -overrwriting the null terminator. GMP then reads past the intended string into adjacent -memory, encounters non-digit bytes, and returns error code -1. +Mojo's optimizer may drop the `String` parameter `s` (freeing its buffer) before +`external_call` reads the pointer. Under heavy allocation pressure (millions of +BigInt/BigDecimal ops), the freed buffer gets reused immediately, +overwriting the null terminator. GMP reads past the string into adjacent +memory, hits non-digit bytes, returns -1. -**How we found it**: The crash only appeared in the full benchmark binary after -millions of prior allocations, never in small test files. Adding `fprintf(stderr)` -logging to the C wrapper revealed that: +**How I found it**: The crash only showed up in the full benchmark binary after +millions of prior allocations — never in small test files. Adding `fprintf(stderr)` +logging to the C wrapper revealed: - `strlen()` on the pointer showed 109 instead of expected 100 - `mpz_set_str` returned -1 (failure), not 0 (success) @@ -1292,17 +1494,17 @@ def gmp_set_str(h: c_int, s: String): _ = external_call["gmpw_set_str_n", c_int, c_int, Int, c_int](h, ptr, slen) ``` -**Rule**: **ALWAYS use `gmpw_set_str_n` with explicit length for any String → C FFI -call.** Never rely on null-termination of Mojo `String.unsafe_ptr()` in `external_call`. +**Rule**: **Always use `gmpw_set_str_n` with explicit length for any String → C FFI +call.** Don't rely on null-termination of Mojo `String.unsafe_ptr()` in `external_call`. -**Broader implication**: This affects ALL FFI calls that pass Mojo `String` pointers. -Every such call in the codebase must either: +**Broader implication**: This affects ALL FFI calls passing Mojo `String` pointers. +Every such call must either: 1. Use the length-aware C variant (recommended), or 2. Store the String in a local variable AND ensure no other allocation occurs between `unsafe_ptr()` extraction and the `external_call` (fragile, not recommended) -### 15.2 `mojo run` Does Not Support `-Xlinker` Flags +### 16.2 `mojo run` Does Not Support `-Xlinker` Flags **Symptom**: `mojo run file.mojo -Xlinker -lgmp_wrapper` ignores the linker flags, causing "undefined symbol" errors at runtime. @@ -1322,7 +1524,7 @@ pixi run mojo build -Xlinker -L... -Xlinker -lgmp_wrapper -o program program.moj pixi run mojo run -Xlinker -L... program.mojo ``` -### 15.3 Mojo 0.26.2 Syntax Differences +### 16.3 Mojo 0.26.2 Syntax Differences Several Mojo syntax changes from earlier versions caused compile errors: @@ -1333,29 +1535,28 @@ Several Mojo syntax changes from earlier versions caused compile errors: | UInt → Float64 | `Float64(some_uint)` | `Float64(Int(some_uint))` | | UnsafePointer | `UnsafePointer[Byte](address=ptr)` | `UnsafePointer[Byte](unsafe_from_address=ptr)` | -### 15.4 GMP `__gmp_overflow_in_mpz` Means Bad Input, Not Overflow +### 16.4 GMP `__gmp_overflow_in_mpz` Means Bad Input, Not Overflow -Despite the name, `__gmp_overflow_in_mpz` typically means the `mpz_t` struct has -corrupted metadata (alloc=0, size=0, or invalid limb pointer), NOT that a computation -result was too large. If you see this error: +Despite the name, `__gmp_overflow_in_mpz` usually means the `mpz_t` struct has +corrupted metadata (alloc=0, size=0, or bad limb pointer), NOT that a result was too large. If you see this: 1. **First check**: Did `mpz_set_str` fail? (Return value -1 means the string was invalid.) 2. **Second check**: Was the handle cleared and reused? 3. **Third check**: Was the handle ever initialized? -In our case it was always (1) — the string pointer was dangling. +In my case it was always (1) — the string pointer was dangling. -### 15.5 Handle Pool Sizing +### 16.5 Handle Pool Sizing -The C wrapper uses a static array of `MAX_HANDLES = 4096` handles. For Phase 1 (sqrt), +The C wrapper uses a static array of `MAX_HANDLES = 4096`. For Phase 1 (sqrt), each Newton iteration typically uses 3–5 temporary handles, so 4096 is plenty. But for Phase 2 (complex expression trees), monitor handle usage. Consider: - Adding `gmpw_handles_in_use()` diagnostic function - Expanding to dynamic allocation if 4096 is insufficient -- Ensuring every code path calls `gmpw_clear()` (RAII via `GMPDecimal.__del__`) +- Ensuring every code path calls `gmpw_clear()` (RAII via `BigFloat.__del__`) -### 15.6 Build Command Template +### 16.6 Build Command Template The full build command for any GMP-linked Mojo file on macOS: @@ -1377,7 +1578,7 @@ Key points: - The binary must be run from the directory containing `libgmp_wrapper.dylib` (or use absolute rpath) -### 15.7 Debugging Tip: Add `fprintf(stderr)` to C Wrapper +### 16.7 Debugging Tip: Add `fprintf(stderr)` to C Wrapper When debugging FFI issues, temporarily add `fprintf(stderr)` logging to the C wrapper functions. This is invaluable because Mojo's error reporting for FFI issues is limited. @@ -1386,26 +1587,24 @@ The C wrapper can inspect raw pointer values, string contents, and GMP internal Always remove debug logging before benchmarking — `fprintf` has measurable overhead. ---- - ## Summary -GMP/MPFR integration is **technically proven and recommended** for Decimo, with BigDecimal -as the priority target using **MPFR** (not raw `mpz_t`): - -1. **MPFR for BigDecimal** — Uses `mpfr_t` (arbitrary-precision binary float) instead of - `mpz_t` + manual scale tracking. Every operation (sqrt, exp, ln, sin, cos, tan, π, - divide) is a **single MPFR call**. No Newton iteration or Taylor series implementation - needed from the Mojo side. The `GMPDecimal` struct has just one field (`handle`). -2. **`gmp: Bool = False`** — Simple opt-in parameter on compute-heavy functions. Defaults - to native Decimo for backward compatibility. Guard digits handle binary↔decimal rounding. -3. **Single binary, runtime detection** — The C wrapper uses `dlopen`/`dlsym` to load - MPFR lazily at runtime (verified working in Mojo 0.26.2). Users who never call - `gmp=True` pay zero cost. If MPFR isn't installed, a clear error message is raised. -4. **BigInt deferred** to Phase 3 (uses `mpz_t` with word import/export). -5. **CLI `--gmp` deferred** to Phase 4. -6. **Phase 2 is trivial** — Because MPFR has built-in functions, wiring each BigDecimal - operation is a thin wrapper (~5 lines each), not a reimplementation of algorithms. - -The prototype in `temp/gmp/` and the lessons in Section 15 provide a foundation for -implementation. Start with Phase 1, Step 1.1 (`brew install mpfr`). +GMP/MPFR integration is **proven and worth doing** for Decimo. Two-type architecture: + +1. **BigFloat** — new first-class MPFR-backed binary float type. Like mpmath's `mpf` + but faster. Every operation (sqrt, exp, ln, sin, cos, tan, π, divide) is a **single + MPFR call**. Struct is one field (`handle: Int32`). Requires MPFR — no fallback, + no apologies. Users without MPFR use BigDecimal. +2. **BigDecimal `gmp=True`** — convenience sugar for one-off acceleration. Internally + delegates to BigFloat. Backward-compatible: defaults to `gmp=False`, existing code + unchanged. Guard digits handle binary↔decimal rounding. +3. **Two types, two use cases** — BigFloat for fast scientific computing (binary + semantics). BigDecimal for exact decimal arithmetic (finance, human-friendly output). + Both in one library. +4. **Single binary, runtime detection** — C wrapper uses `dlopen`/`dlsym` to load + MPFR lazily (verified on Mojo 0.26.2). If MPFR isn't installed, BigDecimal works + fine without it; BigFloat raises a clear error. +5. **BigInt** deferred to Phase 3 (`mpz_t` with word import/export, ~10× speedup). +6. **Future: lazy evaluation** — `BigDecimal.lazy()...collect()` records an operation + DAG and executes the whole chain in MPFR with zero intermediate conversions. + Deferred to Phase 4, but architecturally clean because BigFloat provides the backend. From 30454666f9a34855fb7aa152ae065cfae689df03 Mon Sep 17 00:00:00 2001 From: ZHU Yuhao Date: Fri, 3 Apr 2026 22:41:04 +0200 Subject: [PATCH 4/6] Update planning --- docs/plans/gmp_integration.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md index d84b951f..7ce47e9a 100644 --- a/docs/plans/gmp_integration.md +++ b/docs/plans/gmp_integration.md @@ -1211,7 +1211,7 @@ python-flint wraps FLINT (which includes Arb) for Python. ## 14. Implementation Roadmap > BigFloat first (Phase 1), then BigDecimal `gmp=True` sugar (Phase 2), then BigInt -> backend (Phase 3). MPFR has built-in sqrt, exp, ln, sin, cos, tan, π with correct +> backend (Phase 3 or never). MPFR has built-in sqrt, exp, ln, sin, cos, tan, π with correct > rounding — so BigFloat operations are single MPFR calls, and BigDecimal sugar is > just BigFloat under the hood. @@ -1306,7 +1306,7 @@ BigDecimal API unchanged for `gmp=False`. **Checkpoint test**: `pi(precision=10000, gmp=True)` matches known π digits and runs significantly faster than `pi(precision=10000)`. -### Phase 3: BigInt GMP Backend (Deferred) +### Phase 3: BigInt GMP Backend (Deferred, maybe never) **Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred — integer ops are less user-facing. From 6801226ea2e455e4423e369d5d3bc791d2a355a3 Mon Sep 17 00:00:00 2001 From: ZHU Yuhao Date: Sat, 4 Apr 2026 00:25:43 +0200 Subject: [PATCH 5/6] Initialize files --- .gitignore | 3 + NOTICE | 26 ++ docs/readme_unreleased.md | 22 +- pixi.toml | 4 + src/decimo/bigfloat/__init__.mojo | 30 +++ src/decimo/bigfloat/bigfloat.mojo | 98 +++++++ src/decimo/bigfloat/mpfr_wrapper.mojo | 198 ++++++++++++++ src/decimo/gmp/README.md | 43 +++ src/decimo/gmp/build_gmp_wrapper.sh | 41 +++ src/decimo/gmp/gmp_wrapper.c | 374 ++++++++++++++++++++++++++ 10 files changed, 823 insertions(+), 16 deletions(-) create mode 100644 NOTICE create mode 100644 src/decimo/bigfloat/__init__.mojo create mode 100644 src/decimo/bigfloat/bigfloat.mojo create mode 100644 src/decimo/bigfloat/mpfr_wrapper.mojo create mode 100644 src/decimo/gmp/README.md create mode 100755 src/decimo/gmp/build_gmp_wrapper.sh create mode 100644 src/decimo/gmp/gmp_wrapper.c diff --git a/.gitignore b/.gitignore index a05a2e40..3b4a4852 100644 --- a/.gitignore +++ b/.gitignore @@ -28,6 +28,9 @@ kgen.trace.json* local # CLI binary /decimo +# Compiled GMP/MPFR wrapper (built from source) +*.dylib +*.so # Python build artifacts *.so __pycache__/ diff --git a/NOTICE b/NOTICE new file mode 100644 index 00000000..fdcee941 --- /dev/null +++ b/NOTICE @@ -0,0 +1,26 @@ +Decimo +Copyright 2025 Yuhao Zhu + +Licensed under the Apache License, Version 2.0. + + +Third-Party Dependencies +======================== + +Decimo optionally uses the following libraries at runtime, if installed by +the user. Decimo does not include, copy, or distribute any source code or +binaries from these projects. + +GNU MPFR Library (https://www.mpfr.org/) + Copyright (C) Free Software Foundation, Inc. + Licensed under the GNU Lesser General Public License v3.0 or later (LGPLv3+). + +GNU Multiple Precision Arithmetic Library (GMP) (https://gmplib.org/) + Copyright (C) Free Software Foundation, Inc. + Licensed under the GNU Lesser General Public License v3.0 or later (LGPLv3+), + or the GNU General Public License v2.0 or later (GPLv2+). + +MPFR and GMP are loaded at runtime via dlopen/dlsym when the user has +independently installed them (e.g. `brew install mpfr` on macOS or +`apt install libmpfr-dev` on Linux). The BigFloat type requires MPFR; +all other Decimo types work without any external dependencies. diff --git a/docs/readme_unreleased.md b/docs/readme_unreleased.md index 59895438..9cf0ec58 100644 --- a/docs/readme_unreleased.md +++ b/docs/readme_unreleased.md @@ -3,16 +3,16 @@ An arbitrary-precision integer and decimal library for [Mojo](https://www.modular.com/mojo), inspired by Python's `int` and `Decimal`. [![Version](https://img.shields.io/github/v/tag/forfudan/decimo?label=version&color=blue)](https://github.com/forfudan/decimo/releases) +[![Mojo](https://img.shields.io/badge/mojo-0.26.2-orange)](https://docs.modular.com/mojo/manual/) [![pixi](https://img.shields.io/badge/pixi%20add-decimo-purple)](https://prefix.dev/channels/modular-community/packages/decimo) [![CI](https://img.shields.io/github/actions/workflow/status/forfudan/decimo/run_tests.yaml?branch=main&label=tests)](https://github.com/forfudan/decimo/actions/workflows/run_tests.yaml) -[![Last Commit](https://img.shields.io/github/last-commit/forfudan/decimo?color=red)](https://github.com/forfudan/decimo/commits/main)