From 76e3dd46803ca7d41ad0987f963c8aae36d07799 Mon Sep 17 00:00:00 2001
From: ZHU Yuhao <dr.yuhao.zhu@outlook.com>
Date: Sun, 29 Mar 2026 18:20:18 +0200
Subject: [PATCH 1/6] Update readme

---
 README.md          | 4 +---
 docs/readme_cli.md | 0
 2 files changed, 1 insertion(+), 3 deletions(-)
 create mode 100644 docs/readme_cli.md

diff --git a/README.md b/README.md
index 6020858b..a9cd6a2a 100644
--- a/README.md
+++ b/README.md
@@ -40,11 +40,9 @@ The core types are[^auxiliary]:
 | `Decimal` | `BDec`, `BigDecimal` | Equivalent to Python's `decimal.Decimal` | Base-10^9               |
 | `Dec128`  | `Decimal128`         | 128-bit fixed-precision decimal type     | Triple 32-bit words     |
 
----
-
 **Decimo** combines "**Deci**mal" and "**Mo**jo" - reflecting its purpose and implementation language. "Decimo" is also a Latin word meaning "tenth" and is the root of the word "decimal".
 
----
+A CLI calculator tool, built on top of the Decimo library and powered by [ArgMojo](https://github.com/forfudan/argmojo) (a command-line argument parser library), is also available in this repository. It provides a convenient way to perform high-precision calculations directly from the command line.
 
 This repository includes a built-in [TOML parser](./docs/readme_toml.md) (`decimo.toml`), a lightweight pure-Mojo implementation supporting TOML v1.0. It parses configuration files and test data, supporting basic types, arrays, and nested tables. While created for Decimo's testing framework, it offers general-purpose structured data parsing with a clean, simple API.
 
diff --git a/docs/readme_cli.md b/docs/readme_cli.md
new file mode 100644
index 00000000..e69de29b

From c79efb6d15223d19de6f2f0df2fceb43694cd9fb Mon Sep 17 00:00:00 2001
From: ZHU Yuhao <dr.yuhao.zhu@outlook.com>
Date: Fri, 3 Apr 2026 19:34:14 +0200
Subject: [PATCH 2/6] Add plan for gmp integration

---
 docs/plans/gmp_integration.md | 1411 +++++++++++++++++++++++++++++++++
 1 file changed, 1411 insertions(+)
 create mode 100644 docs/plans/gmp_integration.md

diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md
new file mode 100644
index 00000000..4fbb5c4b
--- /dev/null
+++ b/docs/plans/gmp_integration.md
@@ -0,0 +1,1411 @@
+# GMP Integration Plan for Decimo
+
+> **Date**: 2026-04-02  
+> **Target**: decimo >=0.9.0  
+> **Mojo Version**: >=0.26.2  
+> **GMP Version**: 6.3.0 (tested on macOS ARM64, Apple Silicon)
+>
+> 子曰工欲善其事必先利其器  
+> The mechanic, who wishes to do his work well, must first sharpen his tools -- Confucius
+
+## 1. Executive Summary
+
+### Verdict: FEASIBLE ✓
+
+GMP integration into Decimo is technically feasible and has been proven through a working prototype. A C wrapper approach (`gmp_wrapper.c`) successfully bridges Mojo's FFI layer to GMP, with all core arithmetic operations verified.
+
+### Key Findings
+
+| Aspect                   | Status      | Notes                                                                           |
+| ------------------------ | ----------- | ------------------------------------------------------------------------------- |
+| Mojo → C FFI             | ✓ Works     | Via `external_call` with link-time binding                                      |
+| GMP arithmetic ops       | ✓ All work  | Add, sub, mul, div, mod, pow, sqrt, gcd, bitwise                                |
+| Build pipeline           | ✓ Works     | `mojo build` with `-Xlinker` flags                                              |
+| Runtime detection        | ! Partial   | Build-time detection feasible; true runtime dlopen not available in Mojo 0.26.2 |
+| BigInt ↔ GMP conversion  | ✓ Efficient | Both use base-2^32, near-trivial O(n) conversion                                |
+| BigUInt/BigDecimal ↔ GMP | ! Costly    | Requires base-10^9 → binary conversion (O(n²) naïve, O(n·log²n) D&C)            |
+| `mojo run` (JIT)         | ✗ Broken    | `-Xlinker` flags ignored in JIT mode; only `mojo build` works                   |
+| Cross-platform           | ! Varies    | macOS/Linux straightforward; Windows needs MPIR or MinGW GMP                    |
+
+### Recommendation
+
+Integrate GMP as an **optional backend for BigInt** (base-2^32) first, where data conversion
+is cheap and measured speedups are 7–18× across operations. BigDecimal integration should
+follow only for operations where GMP's speed advantage (15–353× in benchmarks) outweighs
+the base-10⁹ ↔ binary conversion cost — primarily large multiplications, sqrt, and GCD at
+1,000+ digits.
+
+---
+
+## 2. Technical Feasibility Assessment
+
+### 2.1 What Was Tested
+
+A complete prototype was built in `temp/gmp/` consisting of:
+
+1. **C wrapper** (`gmp_wrapper.c`): Handle-based API wrapping GMP's `mpz_t` operations
+2. **Shared library** (`libgmp_wrapper.dylib`): Compiled C wrapper linked against GMP
+3. **Mojo FFI test** (`test_gmp_ffi.mojo`): 18 tests covering all operation categories
+4. **Mojo benchmark** (`bench_gmp_vs_decimo.mojo`): Performance characterization across digit sizes
+
+### 2.2 Prototype Results
+
+| Test Category              | Result                                |
+| -------------------------- | ------------------------------------- |
+| Initialization & cleanup   | ✓ Pass                                |
+| String import/export       | ✓ Pass                                |
+| Addition, subtraction      | ✓ Pass                                |
+| Multiplication             | ✓ Pass                                |
+| Floor/truncate division    | ✓ Pass                                |
+| Modulo                     | ✓ Pass                                |
+| Power, modular power       | ✓ Pass                                |
+| Square root                | ✓ Pass                                |
+| GCD                        | ✓ Pass                                |
+| Negation, absolute value   | ✓ Pass                                |
+| Comparison, sign           | ✓ Pass                                |
+| Bitwise AND, OR, XOR       | ✓ Pass                                |
+| Left/right shift           | ✓ Pass                                |
+| Availability/version check | ✓ Pass                                |
+| UInt32 word import/export  | ! Partial (parameter alignment issue) |
+
+### 2.3 Why C Wrapper Is Required
+
+Direct FFI to GMP is impractical in Mojo 0.26.2 because:
+
+1. **No `DLHandle`**: Mojo 0.26.2 does not expose `DLHandle` for runtime dynamic library loading. Only `external_call` (compile-time/link-time binding) is available.
+2. **`mpz_t` is a struct**: GMP's `mpz_t` is `__mpz_struct[1]` (a 1-element array used as a pass-by-reference trick). Mojo cannot directly represent or pass this.
+3. **Pointer origin tracking**: `UnsafePointer` in Mojo 0.26.2 has strict origin tracking that prevents constructing pointers from raw integer addresses in non-`fn` contexts.
+4. **GMP macros**: Many GMP operations (e.g., `mpz_init`, `mpz_clear`) are preprocessor macros, not true function symbols.
+
+The C wrapper solves all these issues by exposing a flat, handle-based C API where Mojo passes and receives only primitive types (`Int32`, `Int`).
+
+---
+
+## 3. Mojo FFI Capabilities and Constraints
+
+### 3.1 Available FFI Mechanism
+
+```mojo
+from sys.ffi import external_call
+
+# Example: calling a C function
+var result = external_call["gmpw_add", NoneType, Int32, Int32, Int32](a, b, c)
+```
+
+- `external_call` resolves symbols at **link time** (not runtime)
+- Function signature must be known at compile time
+- Supports: `Int`, `Int32`, `UInt32`, `Float64`, `NoneType`, pointer-as-`Int`
+- Does **not** support: variadic args, struct-by-value, function pointers
+
+### 3.2 Constraints Discovered
+
+| Constraint                       | Impact                                            | Workaround                                              |
+| -------------------------------- | ------------------------------------------------- | ------------------------------------------------------- |
+| No `DLHandle` (runtime dlopen)   | Cannot conditionally load GMP at runtime          | Compile-time feature flag or separate build targets     |
+| `-Xlinker` ignored in `mojo run` | JIT mode cannot link external libs                | Must use `mojo build` + execute binary                  |
+| `UnsafePointer` origin tracking  | Cannot construct pointers from addresses in `def` | Pass pointers as `Int` type                             |
+| No struct-by-value in FFI        | Cannot pass/return C structs                      | Handle-based API (C wrapper manages structs internally) |
+| `UnsafePointer.alloc()` removed  | Cannot allocate raw memory directly               | Use `List[T](length=n, fill=v)` + `.unsafe_ptr()`       |
+| Same-pointer aliasing forbidden  | Cannot pass same buffer twice to `external_call`  | Use separate buffer variables                           |
+
+### 3.3 Future Mojo Improvements That Would Help
+
+- **`DLHandle` support**: Would enable true runtime detection and conditional loading
+- **Stable `UnsafePointer` API**: Would simplify pointer handling
+- **Global variables**: Would allow module-level GMP backend state
+- **Conditional compilation (`@static_if`)**: Would enable clean feature toggling
+
+---
+
+## 4. Architecture Analysis
+
+### 4.1 Current Decimo Architecture
+
+```txt
+Decimo Types                Internal Representation
+─────────────               ───────────────────────
+BigInt (BInt)        →      List[UInt32] in base-2^32 + Bool sign
+BigUInt              →      List[UInt32] in base-10^9 (unsigned)
+BigDecimal (Decimal) →      BigUInt coefficient + Int scale + Bool sign
+Dec128               →      3×UInt32 coefficient + UInt32 flags (fixed 128-bit)
+```
+
+### 4.2 GMP Architecture
+
+```txt
+GMP Types                   Internal Representation
+─────────────               ───────────────────────
+mpz_t                →      mp_limb_t* (base-2^64 on ARM64) + int size + int alloc
+mpq_t                →      mpz_t numerator + mpz_t denominator
+mpf_t                →      Binary floating-point with configurable precision
+```
+
+### 4.3 Type Mapping
+
+| Decimo Type    | GMP Equivalent                      | Conversion Cost                                       | Integration Priority |
+| -------------- | ----------------------------------- | ----------------------------------------------------- | -------------------- |
+| **BigInt**     | `mpz_t`                             | **Low** — same binary base (2^32 → 2^64 word packing) | **Phase 1** (HIGH)   |
+| **BigUInt**    | `mpz_t` (unsigned subset)           | **High** — base-10^9 → binary conversion              | Phase 3 (LOW)        |
+| **BigDecimal** | `mpf_t` or `mpz_t` + manual scaling | **High** — base conversion + scale management         | Phase 2 (MEDIUM)     |
+| **Dec128**     | Not applicable                      | N/A — fixed precision, already fast                   | Skip                 |
+
+### 4.4 Where GMP Adds Value
+
+GMP's advantages over Decimo's native implementation:
+
+1. **Highly optimized assembly**: GMP includes hand-tuned ARM64 assembly for core operations
+2. **Mature algorithm suite**: Decades of optimization (Schönhage-Strassen for huge multiplications, sub-quadratic GCD, etc.)
+3. **Large-number regime**: GMP's algorithmic cutoffs are well-tuned for every platform
+4. **Additional algorithms**: FFT-based multiplication, extended GCD, primality testing, etc.
+
+Where GMP does **not** add value:
+
+1. **Small numbers** (<50 digits): FFI overhead dominates
+2. **Dec128**: Fixed 128-bit operations are already optimal with native code
+3. **Base-10 operations**: GMP works in binary; base-10 output requires conversion
+4. **Decimal arithmetic**: GMP has no native decimal type; scale management would remain in Mojo
+
+---
+
+## 5. Performance Comparison: GMP vs Decimo
+
+All benchmarks run side-by-side in a single binary (`bench_gmp_vs_decimo.mojo`), compiled
+with `mojo build -O3` (release mode, assertions off). Each data point is the **median of
+7 runs**. Platform: Apple M2, macOS, GMP 6.3.0, Mojo 0.26.2.
+
+### 5.1 BigInt (base-2³²) vs GMP `mpz`
+
+These compare Decimo's `BigInt` (Karatsuba multiply, Burnikel-Ziegler divide) against
+GMP's `mpz_*` functions. Both operate in binary representation; no base conversion needed.
+
+#### Multiplication (n-digit × n-digit)
+
+| Digits | GMP      | Decimo    | Decimo / GMP |
+| ------ | -------- | --------- | ------------ |
+| 50     | 12 ns    | 106 ns    | 8.8×         |
+| 100    | 21 ns    | 156 ns    | 7.4×         |
+| 500    | 223 ns   | 2.20 µs   | 9.9×         |
+| 1,000  | 733 ns   | 7.45 µs   | 10.2×        |
+| 5,000  | 10.40 µs | 127.44 µs | 12.3×        |
+| 10,000 | 24.45 µs | 380.81 µs | 15.6×        |
+
+GMP is **8–16× faster** for multiplication, with the gap widening at larger sizes due to
+GMP's hand-tuned assembly (Karatsuba / Toom-3 / FFT) versus Decimo's pure-Mojo Karatsuba.
+
+#### Addition (n-digit + n-digit)
+
+| Digits | GMP    | Decimo  | Decimo / GMP |
+| ------ | ------ | ------- | ------------ |
+| 50     | 5 ns   | 51 ns   | 10.2×        |
+| 100    | 5 ns   | 56 ns   | 11.2×        |
+| 500    | 8 ns   | 111 ns  | 13.9×        |
+| 1,000  | 12 ns  | 181 ns  | 15.1×        |
+| 5,000  | 69 ns  | 667 ns  | 9.7×         |
+| 10,000 | 150 ns | 1.24 µs | 8.2×         |
+
+GMP is **8–15× faster** for addition. GMP uses assembly-optimized `mpn_add_n`; Decimo
+uses SIMD vectorization but cannot match hand-written assembly.
+
+#### Floor Division (n-digit / (n/2)-digit)
+
+| Digits | GMP      | Decimo    | Decimo / GMP |
+| ------ | -------- | --------- | ------------ |
+| 100    | 44 ns    | 510 ns    | 11.6×        |
+| 500    | 218 ns   | 1.76 µs   | 8.1×         |
+| 1,000  | 593 ns   | 5.47 µs   | 9.2×         |
+| 5,000  | 7.16 µs  | 81.73 µs  | 11.4×        |
+| 10,000 | 22.62 µs | 254.62 µs | 11.3×        |
+
+GMP is **8–12× faster** for division. Both use sub-quadratic algorithms (Burnikel-Ziegler),
+but GMP's multiply subroutine is faster, and its lower-level code is more optimized.
+
+#### Integer Square Root
+
+| Digits | GMP      | Decimo    | Decimo / GMP |
+| ------ | -------- | --------- | ------------ |
+| 50     | 44 ns    | 205 ns    | 4.7×         |
+| 100    | 81 ns    | 709 ns    | 8.8×         |
+| 500    | 273 ns   | 3.55 µs   | 13.0×        |
+| 1,000  | 569 ns   | 8.41 µs   | 14.8×        |
+| 5,000  | 5.90 µs  | 104.67 µs | 17.7×        |
+| 10,000 | 17.52 µs | 322.26 µs | 18.4×        |
+
+GMP is **5–18× faster** for sqrt. Both use Newton's method, but GMP's inner multiply/divide
+is much faster, and the gap accumulates over Newton iterations.
+
+#### GCD
+
+| Digits | GMP     | Decimo  | Decimo / GMP |
+| ------ | ------- | ------- | ------------ |
+| 50     | 53 ns   | 216 ns  | 4.1×         |
+| 100    | 45 ns   | 247 ns  | 5.5×         |
+| 500    | 93 ns   | 543 ns  | 5.8×         |
+| 1,000  | 142 ns  | 955 ns  | 6.7×         |
+| 5,000  | 646 ns  | 4.21 µs | 6.5×         |
+| 10,000 | 1.95 µs | 8.02 µs | 4.1×         |
+
+GMP is **4–7× faster** for GCD. GMP uses a sub-quadratic half-GCD algorithm (Lehmer/​Schönhage);
+Decimo currently uses the Euclidean algorithm. This is the smallest gap of all operations.
+
+### 5.2 BigDecimal (base-10⁹) vs GMP Integer Equivalents
+
+GMP has no native decimal type. To compare fairly, we benchmark GMP performing the
+**equivalent integer computation** that a GMP-backed BigDecimal would perform:
+
+- **Multiply**: GMP integer multiply on same-digit-count coefficients
+- **Sqrt(2) at precision P**: GMP `isqrt(2 × 10^(2P))` vs Decimo `BigDecimal.sqrt(precision=P)`
+- **Division (1/3) at precision P**: GMP `(1 × 10^P) / 3` vs Decimo `BigDecimal.true_divide(…, precision=P)`
+
+These ratios represent the **best-case speedup** from a GMP backend, before
+accounting for base-10⁹ ↔ binary conversion overhead.
+
+#### BigDecimal Multiplication (n sig-digits × n sig-digits)
+
+| Digits | GMP (int mul) | Decimo (BigDecimal) | Decimo / GMP |
+| ------ | ------------- | ------------------- | ------------ |
+| 28     | 9 ns          | 137 ns              | 15.2×        |
+| 100    | 21 ns         | 327 ns              | 15.6×        |
+| 500    | 226 ns        | 7.92 µs             | 35.1×        |
+| 1,000  | 725 ns        | 25.75 µs            | 35.5×        |
+| 5,000  | 10.72 µs      | 309.71 µs           | 28.9×        |
+| 10,000 | 24.63 µs      | 923.19 µs           | 37.5×        |
+
+BigDecimal multiply is **15–38× slower** than GMP integer multiply. This large gap comes
+from two factors: (1) BigDecimal uses base-10⁹ `BigUInt` internally (not the faster base-2³²
+`BigInt`), and (2) GMP's assembly is faster than either Decimo representation.
+
+#### BigDecimal Sqrt(2) at Various Precisions
+
+| Precision | GMP (isqrt) | Decimo (BigDecimal.sqrt) | Decimo / GMP |
+| --------- | ----------- | ------------------------ | ------------ |
+| 28        | 32 ns       | 3.48 µs                  | 109×         |
+| 100       | 112 ns      | 9.58 µs                  | 86×          |
+| 500       | 430 ns      | 93.41 µs                 | 217×         |
+| 1,000     | 1.05 µs     | 258.40 µs                | 246×         |
+| 2,000     | 2.75 µs     | 869.55 µs                | 316×         |
+| 5,000     | 11.10 µs    | 3.92 ms                  | 353×         |
+
+BigDecimal sqrt is **86–353× slower** than GMP's integer sqrt equivalent. This extreme gap
+reflects the compounding cost of Newton iteration: every iteration requires multiple
+BigDecimal multiplies and divides, each of which is individually 15–35× slower than GMP.
+
+#### BigDecimal Division (1/3) at Various Precisions
+
+| Precision | GMP (int div) | Decimo (BigDecimal.true_divide) | Decimo / GMP |
+| --------- | ------------- | ------------------------------- | ------------ |
+| 28        | 10 ns         | 426 ns                          | 42.6×        |
+| 100       | 18 ns         | 496 ns                          | 27.6×        |
+| 500       | 70 ns         | 830 ns                          | 11.9×        |
+| 1,000     | 150 ns        | 1.30 µs                         | 8.7×         |
+| 2,000     | 300 ns        | 2.30 µs                         | 7.7×         |
+| 5,000     | 800 ns        | 4.80 µs                         | 6.0×         |
+
+BigDecimal division is **6–43× slower**. The ratio decreases at higher precision because
+Decimo's division cost is dominated by the final long-division step (linear), while at
+low precision the constant overhead of BigDecimal setup is proportionally larger.
+
+### 5.3 Summary of Speedup Ratios
+
+| Operation           | Small (≤100d) | Medium (~1,000d) | Large (≥5,000d) | Key Driver                              |
+| ------------------- | ------------- | ---------------- | --------------- | --------------------------------------- |
+| BigInt Multiply     | 7–9×          | 10×              | 12–16×          | GMP asm multiply (Karatsuba/Toom/FFT)   |
+| BigInt Add          | 10–11×        | 15×              | 8–10×           | GMP asm `mpn_add_n`                     |
+| BigInt Floor Divide | 12×           | 9×               | 11×             | GMP asm multiply in Burnikel-Ziegler    |
+| BigInt Sqrt         | 5–9×          | 15×              | 18×             | Multiply gap compounds in Newton iters  |
+| BigInt GCD          | 4–6×          | 7×               | 4–7×            | GMP sub-quadratic half-GCD              |
+| BigDecimal Multiply | 15–16×        | 36×              | 29–38×          | base-10⁹ overhead + GMP asm             |
+| BigDecimal Sqrt     | 86–109×       | 246×             | 316–353×        | Newton iteration compounds the gap      |
+| BigDecimal Divide   | 28–43×        | 9×               | 6–8×            | Gap narrows as division dominates setup |
+
+### 5.4 FFI Overhead Consideration
+
+The C wrapper call path has fixed overhead:
+
+1. `external_call` dispatch: ~1–2 ns
+2. Handle array lookup: ~1–2 ns
+3. Total fixed overhead: **~3–5 ns per call**
+
+This means:
+
+- At **10 digits**: overhead is 30–100% of operation time → **GMP not beneficial**
+- At **100 digits**: overhead is 5–10% → **GMP moderately beneficial**
+- At **1,000+ digits**: overhead is <0.5% → **GMP clearly beneficial**
+
+**Recommended crossover threshold**: Use GMP backend for numbers with **≥64 words** (~600
+digits for BigInt). Below this, Decimo's native implementation avoids FFI overhead.
+
+### 5.5 Technical Note: Mojo String FFI Safety
+
+During benchmarking, we discovered a Mojo compiler issue where the optimizer may drop a
+`String` parameter's backing buffer before the `external_call` using its `unsafe_ptr()`
+completes. Under heavy allocation pressure, the freed buffer gets reused immediately,
+causing GMP's `mpz_set_str` to read corrupted data.
+
+**Workaround**: A length-aware C function `gmpw_set_str_n(handle, ptr, len)` was added that
+copies exactly `len` bytes into a null-terminated buffer before calling GMP, making the
+FFI call safe regardless of the caller's buffer lifetime. Any production integration must
+use the length-based variant (`gmpw_set_str_n`) or ensure String buffers remain live
+through explicit variable binding.
+
+---
+
+## 6. Operation Coverage Mapping
+
+### 6.1 BigInt Operations → GMP Functions
+
+| Decimo BigInt Operation  | GMP Function (via wrapper)          | Status         |
+| ------------------------ | ----------------------------------- | -------------- |
+| `__add__`                | `mpz_add` → `gmpw_add`              | ✓ Tested       |
+| `__sub__`                | `mpz_sub` → `gmpw_sub`              | ✓ Tested       |
+| `__mul__`                | `mpz_mul` → `gmpw_mul`              | ✓ Tested       |
+| `__floordiv__`           | `mpz_fdiv_q` → `gmpw_fdiv_q`        | ✓ Tested       |
+| `__truediv__`            | `mpz_tdiv_q` → `gmpw_tdiv_q`        | ✓ Tested       |
+| `__mod__`                | `mpz_mod` → `gmpw_mod`              | ✓ Tested       |
+| `__pow__`                | `mpz_pow_ui` → `gmpw_pow_ui`        | ✓ Tested       |
+| `mod_pow`                | `mpz_powm` → `gmpw_powm`            | ✓ Tested       |
+| `sqrt` / `isqrt`         | `mpz_sqrt` → `gmpw_sqrt`            | ✓ Tested       |
+| `gcd`                    | `mpz_gcd` → `gmpw_gcd`              | ✓ Tested       |
+| `__neg__`                | `mpz_neg` → `gmpw_neg`              | ✓ Tested       |
+| `__abs__`                | `mpz_abs` → `gmpw_abs`              | ✓ Tested       |
+| `__eq__`, `__lt__`, etc. | `mpz_cmp` → `gmpw_cmp`              | ✓ Tested       |
+| `__and__`                | `mpz_and` → `gmpw_and`              | ✓ Tested       |
+| `__or__`                 | `mpz_ior` → `gmpw_or`               | ✓ Tested       |
+| `__xor__`                | `mpz_xor` → `gmpw_xor`              | ✓ Tested       |
+| `__lshift__`             | `mpz_mul_2exp` → `gmpw_lshift`      | ✓ Tested       |
+| `__rshift__`             | `mpz_fdiv_q_2exp` → `gmpw_rshift`   | ✓ Tested       |
+| `lcm`                    | `mpz_lcm`                           | Wrapper needed |
+| `extended_gcd`           | `mpz_gcdext`                        | Wrapper needed |
+| `mod_inverse`            | `mpz_invert`                        | Wrapper needed |
+| `from_string`            | `mpz_set_str` → `gmpw_set_str`      | ✓ Tested       |
+| `__str__`                | `mpz_get_str` → `gmpw_get_str_buf`  | ✓ Tested       |
+| Word import              | `mpz_import` → `gmpw_import_u32_le` | ! Partial      |
+| Word export              | `mpz_export` → `gmpw_export_u32_le` | ! Partial      |
+
+### 6.2 BigDecimal Operations → GMP Coverage
+
+BigDecimal operations would use GMP indirectly by converting the coefficient (BigUInt) to/from GMP:
+
+| Operation   | GMP Applicability | Notes                                                                  |
+| ----------- | ----------------- | ---------------------------------------------------------------------- |
+| Add/Sub     | ! Low value       | Scale alignment dominates; coefficient add is cheap                    |
+| Multiply    | ✓ High value      | Large coefficient multiplication benefits from GMP                     |
+| Divide      | ✓ High value      | Newton iteration on large coefficients benefits                        |
+| Sqrt        | ✓ High value      | Large-precision sqrt benefits from GMP's optimized Newton              |
+| Exp/Ln/Trig | ! Mixed           | Series evaluation is Mojo-managed; inner multiplications could use GMP |
+| Round       | ✗ No value        | Pure digit manipulation, no heavy arithmetic                           |
+
+### 6.3 Operations NOT in GMP
+
+These must remain native Mojo:
+
+- Decimal rounding (all modes)
+- Scale/exponent management
+- Base-10 digit manipulation
+- Trigonometric series evaluation logic
+- TOML parsing
+- CLI calculator evaluation
+- Dec128 operations (fixed-precision)
+
+---
+
+## 7. Data Conversion Between Decimo and GMP
+
+### 7.1 BigInt ↔ GMP (Efficient)
+
+BigInt uses base-2^32 `List[UInt32]` in little-endian order — essentially the same representation as GMP's `mpz_t` but with 32-bit limbs instead of 64-bit.
+
+**Conversion approach**:
+
+```txt
+BigInt.words: [w0, w1, w2, w3, ...]  (UInt32, little-endian)
+GMP limbs:   [w0|w1, w2|w3, ...]     (UInt64, little-endian)
+```
+
+Using `mpz_import`:
+
+```c
+// Import BigInt words into GMP
+mpz_import(z, word_count, -1, sizeof(uint32_t), 0, 0, words_ptr);
+// -1 = least significant word first (little-endian word order)
+// sizeof(uint32_t) = 4 bytes per word
+// 0 = native byte order within word
+// 0 = no nail bits
+```
+
+Using `mpz_export`:
+
+```c
+// Export GMP value to BigInt words
+size_t count;
+mpz_export(words_ptr, &count, -1, sizeof(uint32_t), 0, 0, z);
+```
+
+**Cost**: O(n) memory copy — essentially free compared to the arithmetic operations.
+
+**Sign handling**: BigInt stores sign as separate `Bool`; GMP stores sign in `_mp_size` (negative = negative). Trivial to map.
+
+**Implementation in C wrapper** (already prototyped):
+
+```c
+void gmpw_import_u32_le(int handle, const uint32_t* words, int count);
+int  gmpw_export_u32_le(int handle, uint32_t* out_buf, int max_count);
+```
+
+### 7.2 BigUInt ↔ GMP (Expensive)
+
+BigUInt uses base-10^9 — a fundamentally different number base than GMP's binary format.
+
+**Conversion options**:
+
+| Method             | Complexity | Description                                                    |
+| ------------------ | ---------- | -------------------------------------------------------------- |
+| String round-trip  | O(n²)      | `BigUInt → String → mpz_set_str` — simple but slow             |
+| Horner's method    | O(n²)      | Evaluate polynomial: `((w[n-1] × 10^9 + w[n-2]) × 10^9 + ...)` |
+| Divide-and-conquer | O(n·log²n) | Split at midpoint, convert halves, combine with shift+add      |
+
+**Recommended approach**:
+
+- For <1000 words: Horner's method (simpler, O(n²) acceptable)
+- For ≥1000 words: Divide-and-conquer using GMP's own multiply for combining
+
+**Break-even analysis**: If we assume base conversion costs ~O(n²) for n words:
+
+- At 100 words (~900 digits): conversion ~10 µs, GMP multiply saves ~4 µs → **NOT worth it**
+- At 1000 words (~9000 digits): conversion ~1 ms, GMP multiply saves ~150 µs → **NOT worth it for single operation**
+- **Conclusion**: BigUInt→GMP conversion is only justified when amortized across many operations (e.g., performing multiple root/exp iterations on the same number).
+
+### 7.3 Recommended Strategy
+
+```txt
+                    ┌──────────────────────────────────────┐
+ BigInt operations  │  Direct word import/export (O(n))    │ → Phase 1
+                    │  Cheap, always beneficial > 64 words │
+                    └──────────────────────────────────────┘
+
+                    ┌──────────────────────────────────────┐
+ BigDecimal ops     │  Convert once, compute many times    │ → Phase 2
+ (iterative: sqrt,  │  Cache GMP representation during     │
+  exp, ln, etc.)    │  multi-step computations             │
+                    └──────────────────────────────────────┘
+
+                    ┌──────────────────────────────────────┐
+ BigDecimal ops     │  Skip GMP — conversion cost          │ → Phase 3 (or skip)
+ (single-shot:      │  exceeds arithmetic savings          │
+  add, sub, round)  │                                      │
+                    └──────────────────────────────────────┘
+```
+
+---
+
+## 8. GMP Detection and Fallback Strategy
+
+### 8.1 Detection Approaches
+
+Since Mojo 0.26.2 lacks `DLHandle` for runtime library probing, detection must be **build-time** or **semi-static**:
+
+#### Option A: Compile-Time Feature Flag (Recommended for Now)
+
+```toml
+# pixi.toml
+[feature.gmp]
+tasks.package-gmp = "mojo build -DGMP_ENABLED -Xlinker -lgmp_wrapper ..."
+```
+
+```mojo
+# In Decimo source (when Mojo supports comptime parameters)
+@parameter
+if GMP_ENABLED:
+    alias _engine = GMPEngine
+else:
+    alias _engine = NativeEngine
+```
+
+**Pros**: Zero overhead, no runtime branching  
+**Cons**: Requires separate builds for GMP and non-GMP versions
+
+#### Option B: Probe Function at Module Load
+
+```mojo
+# Use the C wrapper's availability check
+fn _check_gmp_available() -> Bool:
+    """Call gmpw_available() which returns 1 if GMP is linked."""
+    try:
+        var result = external_call["gmpw_available", Int32]()
+        return result == 1
+    except:
+        return False
+
+# Cache at first use (workaround for no global variables)
+struct _GMPState:
+    var checked: Bool
+    var available: Bool
+```
+
+**Pros**: Single binary works with or without GMP  
+**Cons**: Requires the wrapper to be linked (even if GMP is absent)
+
+#### Option C: Two-Library Approach
+
+Build two versions of the wrapper:
+
+1. `libgmp_wrapper.dylib` — full GMP wrapper (requires GMP)
+2. `libgmp_stub.dylib` — stub that returns "unavailable" for all functions
+
+Link against whichever is present. The stub library lets Decimo compile and run without GMP.
+
+**Pros**: Clean separation, single Mojo codebase  
+**Cons**: Requires distributing two library variants
+
+### 8.2 Recommended Detection Strategy
+
+**Phase 1 (immediate)**: Use **Option A** — compile-time feature flag. Build two package variants:
+
+- `decimo.mojopkg` — native only (default)
+- `decimo_gmp.mojopkg` — with GMP backend
+
+**Phase 2 (when Mojo matures)**: Migrate to **Option B** with `DLHandle` when available, enabling true runtime detection.
+
+### 8.3 Fallback Behavior
+
+```txt
+User calls BigInt.multiply(a, b):
+  1. Check engine preference (from parameter or default)
+  2. If engine == GMP and gmp_available:
+     a. Check size threshold (≥64 words?)
+     b. If yes: convert words → GMP, compute, convert back
+     c. If no: use native Mojo implementation
+  3. If engine == Native or !gmp_available:
+     → Use native Mojo implementation (current code)
+```
+
+---
+
+## 9. API Design: `gmp: Bool = False` Parameter
+
+### 9.1 Design Rationale
+
+Since Mojo lacks global variables, a module-level context or engine setting is not
+possible. Instead, we add a **`gmp: Bool = False`** keyword parameter to compute-heavy
+BigDecimal functions. This is:
+
+- **Explicit** — users opt in per-call; no hidden magic
+- **Backward-compatible** — defaults to `False`, existing code unchanged
+- **Composable** — users pass `gmp=True` down through complex expressions like π calculation
+- **Simple** — a Bool is clearer than an engine enum for two choices
+
+### 9.2 Which Functions Get the `gmp` Parameter
+
+Only **iterative compute-heavy** operations — not add/sub/multiply (which are building blocks
+called internally and whose overhead is handled by the GMPDecimal struct).
+
+| Function                                   | Gets `gmp`? | Reason                                         |
+| ------------------------------------------ | ----------- | ---------------------------------------------- |
+| `true_divide(x, y, precision, gmp)`        | ✓ Yes       | Long division at high precision benefits       |
+| `sqrt(x, precision, gmp)`                  | ✓ Yes       | Newton iteration: 86–353× gap                  |
+| `root(x, n, precision, gmp)`               | ✓ Yes       | Newton iteration like sqrt                     |
+| `exp(x, precision, gmp)`                   | ✓ Yes       | Taylor series with many multiplications        |
+| `ln(x, precision, gmp)`                    | ✓ Yes       | Atanh-based approach with iterative multiplies |
+| `sin(x, precision, gmp)`                   | ✓ Yes       | Taylor series                                  |
+| `cos(x, precision, gmp)`                   | ✓ Yes       | Taylor series                                  |
+| `tan(x, precision, gmp)`                   | ✓ Yes       | Uses sin/cos                                   |
+| `pi(precision, gmp)`                       | ✓ Yes       | Chudnovsky binary splitting: many large muls   |
+| `BigDecimal.__add__`, `__sub__`, `__mul__` | ✗ No        | Handled internally by GMPDecimal               |
+| `BigDecimal.__floordiv__`, `round`         | ✗ No        | Digit manipulation, no heavy arithmetic        |
+
+### 9.3 Proposed Signatures
+
+```mojo
+# Free functions (primary API)
+def true_divide(x: BigDecimal, y: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def sqrt(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def root(x: BigDecimal, n: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def exp(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def ln(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def sin(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def cos(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def tan(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+def pi(precision: Int, gmp: Bool = False) raises -> BigDecimal:
+
+# Instance methods (delegate to free functions)
+def sqrt(self, precision: Int = PRECISION, gmp: Bool = False) raises -> Self:
+    return exponential.sqrt(self, precision, gmp)
+```
+
+### 9.4 Why MPFR, Not `mpz_t` with Manual Scale
+
+GMP provides three numeric types:
+
+| Type     | What it stores                                      | Mojo analogue | Use for BigDecimal?                                             |
+| -------- | --------------------------------------------------- | ------------- | --------------------------------------------------------------- |
+| `mpz_t`  | Arbitrary-precision **integer**                     | BigInt        | ✗ Poor — must manually track scale, sign, scale-up for division |
+| `mpf_t`  | Arbitrary-precision **binary float**                | (none)        | △ Works but legacy, no correct rounding                         |
+| `mpfr_t` | Arbitrary-precision **binary float** (MPFR library) | BigDecimal    | **✓ Best** — correctly rounded, built-in exp/ln/sin/cos/π       |
+
+If we used `mpz_t` (integer only), we'd need to:
+
+- Track `scale` and `sign` separately in GMPDecimal (3 fields)
+- Manually scale up the numerator by 10^N before integer division
+- Implement Newton iterations for sqrt, Taylor series for exp/ln/trig — **ourselves**
+- Handle scale alignment for addition/subtraction
+
+With **MPFR** (`mpfr_t`), all of this is handled internally:
+
+- Division: `mpfr_div(r, a, b, MPFR_RNDN)` — one call
+- Sqrt: `mpfr_sqrt(r, a, MPFR_RNDN)` — one call
+- Exp: `mpfr_exp(r, a, MPFR_RNDN)` — one call, correctly rounded
+- Ln: `mpfr_log(r, a, MPFR_RNDN)` — one call
+- Sin/Cos/Tan: `mpfr_sin`, `mpfr_cos`, `mpfr_tan` — one call each
+- π: `mpfr_const_pi(r, MPFR_RNDN)` — one call, arbitrary precision
+
+**MPFR** is a separate library built on top of GMP. It is the standard for
+arbitrary-precision floating-point and is what Python's `mpmath` (via `gmpy2`)
+uses under the hood. It is widely available: `brew install mpfr` (macOS),
+`apt install libmpfr-dev` (Linux).
+
+### 9.5 The GMPDecimal Intermediate Struct (MPFR-Based)
+
+Because MPFR natively handles sign, scale, and precision, the struct simplifies
+to **a single field** — the MPFR handle:
+
+```mojo
+struct GMPDecimal:
+    """A decimal value held in MPFR floating-point representation.
+    
+    Wraps a single mpfr_t handle. All arithmetic, sqrt, exp, ln, trig
+    are single MPFR calls. Convert to/from BigDecimal only at entry/exit.
+    """
+    var handle: Int32      # MPFR handle (via C wrapper)
+
+    def __init__(out self, bd: BigDecimal, precision: Int):
+        """Convert BigDecimal → GMPDecimal.
+        
+        precision: decimal digits wanted → converted to bits internally.
+        """
+        # 1 decimal digit ≈ 3.322 bits; add guard bits
+        var bits = precision * 4 + 64
+        self.handle = mpfrw_init(bits)
+        # Pass full decimal string (e.g. "-3.14159") directly
+        mpfrw_set_str(self.handle, str(bd))
+
+    def to_bigdecimal(self, precision: Int) -> BigDecimal:
+        """Convert GMPDecimal → BigDecimal.
+        
+        Exports MPFR value as decimal string, truncated to `precision` digits.
+        """
+        var s = mpfrw_get_str(self.handle, precision)
+        return BigDecimal(s)
+
+    # Arithmetic: each is a single MPFR call
+    def multiply(self, other: GMPDecimal) -> GMPDecimal: ...
+    def add(self, other: GMPDecimal) -> GMPDecimal: ...
+    def divide(self, other: GMPDecimal) -> GMPDecimal: ...
+    def sqrt(self) -> GMPDecimal: ...
+    def exp(self) -> GMPDecimal: ...
+    def ln(self) -> GMPDecimal: ...
+    def sin(self) -> GMPDecimal: ...
+    def cos(self) -> GMPDecimal: ...
+    def pi(precision: Int) -> GMPDecimal: ...  # static
+
+    def __del__(owned self):
+        """RAII: free MPFR handle on destruction."""
+        mpfrw_clear(self.handle)
+```
+
+Compare the old vs. new struct:
+
+| Aspect          | Old (mpz_t + manual scale)                  | New (MPFR)                    |
+| --------------- | ------------------------------------------- | ----------------------------- |
+| Fields          | `handle`, `scale`, `sign` (3 fields)        | `handle` (1 field)            |
+| Division        | Scale up by 10^N, `mpz_tdiv_q`, track scale | `mpfr_div(r, a, b, RNDN)`     |
+| Sqrt            | Newton iteration in Mojo using GMPDecimal   | `mpfr_sqrt(r, a, RNDN)`       |
+| Exp             | Taylor series in Mojo using GMPDecimal      | `mpfr_exp(r, a, RNDN)`        |
+| Ln              | Atanh series in Mojo using GMPDecimal       | `mpfr_log(r, a, RNDN)`        |
+| Sin/Cos/Tan     | Taylor series in Mojo                       | `mpfr_sin` / `mpfr_cos` / ... |
+| π               | Chudnovsky in Mojo using GMPDecimal         | `mpfr_const_pi(r, RNDN)`      |
+| Sign handling   | Manual XOR                                  | Automatic                     |
+| Scale alignment | Manual multiply by 10^diff                  | Automatic                     |
+| Rounding        | Manual truncation                           | Correctly rounded (MPFR)      |
+
+**Data flow for `gmp=True`**:
+
+```txt
+sqrt(x: BigDecimal, precision=1000, gmp=True):
+  1. gx = GMPDecimal(x, precision=1010)     ← ONE conversion: string → mpfr_t
+  2. mpfr_sqrt(result, gx, MPFR_RNDN)      ← ONE GMP call (entire computation)
+  3. return result.to_bigdecimal(1000)      ← ONE conversion: mpfr_t → string
+```
+
+For a chained expression like `sqrt(23) * exp(2.1) / pi()`:
+
+```txt
+var a = GMPDecimal(BigDecimal("23"), prec)    ← 1 conversion
+var b = GMPDecimal(BigDecimal("2.1"), prec)   ← 1 conversion
+var sa = a.sqrt()                              ← mpfr_sqrt (one call)
+var eb = b.exp()                               ← mpfr_exp (one call)
+var p  = GMPDecimal.pi(prec)                   ← mpfr_const_pi (one call)
+var r  = sa.multiply(eb).divide(p)             ← mpfr_mul + mpfr_div
+return r.to_bigdecimal(precision)              ← 1 conversion
+```
+
+Total: 3 conversions (2 in + 1 out) + 5 MPFR calls. No Newton iterations,
+no Taylor series, no manual scale tracking.
+
+### 9.6 Behavioral Differences Between `gmp=False` and `gmp=True`
+
+**Important**: MPFR is a **binary** floating-point internally (mantissa stored in base-2).
+This means:
+
+1. **Precision is in bits, not digits** — We convert: `bits = ceil(digits × 3.322)`. To
+   guarantee N correct decimal digits, we request ~N×4 bits (generous guard).
+
+2. **Last 1–2 decimal digits may differ** — Binary↔decimal rounding means the last few
+   digits of `gmp=True` output may not match `gmp=False` output exactly. The numerical
+   value will agree to the requested precision.
+
+3. **Fix: guard digits** — Compute internally with `precision + 10` extra digits, then
+   truncate the result to the requested precision. This is standard practice (Python's
+   `decimal` module does the same).
+
+4. **MPFR guarantees correct rounding** — Every operation is correctly rounded to the
+   working precision. This is actually *stronger* than Decimo's native implementation
+   which may accumulate rounding errors across Newton iterations.
+
+This is why `gmp` defaults to `False` — so existing behavior is preserved and users
+explicitly opt into the MPFR path when they want speed. In practice, the numerical
+results from MPFR are at least as accurate as the native implementation.
+
+### 9.6 CLI Integration (Deferred)
+
+CLI integration is deferred to a later phase. When implemented, a `--gmp` flag on the
+command line would set `gmp=True` for all compute-heavy operations in the session.
+The evaluator would pass this flag down through the function dispatch.
+
+### 9.7 Future: Global Variable or Context
+
+When Mojo adds global variables, the `gmp` parameter can be replaced or supplemented by
+a module-level default:
+
+```mojo
+# Future (when Mojo supports global variables)
+BigDecimal.set_default_engine("gmp")  # Sets global default
+var result = sqrt(x, 1000)            # Automatically uses GMP
+```
+
+The `gmp` parameter will remain as an explicit override even after global variables
+are available, for cases where the user wants fine-grained control.
+
+---
+
+## 10. Build System Integration
+
+### 10.1 C Wrapper Compilation
+
+The C wrapper must be compiled as a shared library and distributed alongside the Mojo package.
+
+**Build script** (`build_gmp_wrapper.sh`):
+
+```bash
+#!/bin/bash
+# Detect GMP installation
+GMP_PREFIX=$(brew --prefix gmp 2>/dev/null || echo "/usr/local")
+
+if [ ! -f "$GMP_PREFIX/include/gmp.h" ]; then
+    echo "GMP not found. Building without GMP support."
+    exit 0
+fi
+
+# Compile wrapper
+cc -shared -O2 -o libgmp_wrapper.dylib gmp_wrapper.c \
+   -I"$GMP_PREFIX/include" -L"$GMP_PREFIX/lib" -lgmp
+
+# Fix install name (macOS)
+if [[ "$(uname)" == "Darwin" ]]; then
+    install_name_tool -id @rpath/libgmp_wrapper.dylib libgmp_wrapper.dylib
+fi
+```
+
+### 10.2 pixi.toml Integration
+
+```toml
+[tasks]
+build-gmp-wrapper = """
+    bash src/decimo/gmp/build_gmp_wrapper.sh
+"""
+
+package-with-gmp = { depends-on = ["build-gmp-wrapper"] }
+package-with-gmp.cmd = """
+    mojo package src/decimo -o decimo.mojopkg && \
+    cp src/decimo/gmp/libgmp_wrapper.dylib .
+"""
+```
+
+### 10.3 Mojo Build Flags
+
+For building executables that use Decimo with GMP:
+
+```bash
+# macOS (Homebrew)
+mojo build \
+  -Xlinker -L./path/to/wrapper -Xlinker -lgmp_wrapper \
+  -Xlinker -L/opt/homebrew/lib -Xlinker -lgmp \
+  -Xlinker -rpath -Xlinker ./path/to/wrapper \
+  -Xlinker -rpath -Xlinker /opt/homebrew/lib \
+  -o myprogram myprogram.mojo
+
+# Linux (system GMP)
+mojo build \
+  -Xlinker -L./path/to/wrapper -Xlinker -lgmp_wrapper \
+  -Xlinker -lgmp \
+  -Xlinker -rpath -Xlinker ./path/to/wrapper \
+  -o myprogram myprogram.mojo
+```
+
+### 10.4 Runtime Detection: Single Binary Without Link-Time Dependencies
+
+**Goal**: A single compiled binary works whether or not MPFR/GMP is installed on the
+target machine. If `gmp=True` is requested and MPFR is found, use it. If not found,
+raise a helpful error. Users who never call `gmp=True` pay zero cost.
+
+**Verified**: `dlopen` works from Mojo 0.26.2 via `external_call["dlopen", ...]`.
+Tested successfully — GMP was loaded at runtime without any `-Xlinker` flags at
+compile time.
+
+**Architecture**: The C wrapper does lazy loading internally using `dlopen`/`dlsym`,
+so it compiles and links **without** needing MPFR/GMP headers or libraries present:
+
+```c
+// gmp_wrapper.c — always compilable, no link-time dependency on libmpfr
+#include <dlfcn.h>
+
+static void *mpfr_lib = NULL;
+static int   mpfr_loaded = 0;
+
+// Function pointer types
+typedef int  (*mpfr_init2_t)(void*, long);
+typedef void (*mpfr_clear_t)(void*);
+typedef int  (*mpfr_set_str_t)(void*, const char*, int, int);
+// ... one typedef per MPFR function used
+
+static mpfr_init2_t   p_mpfr_init2   = NULL;
+static mpfr_clear_t   p_mpfr_clear   = NULL;
+static mpfr_set_str_t p_mpfr_set_str = NULL;
+// ... etc.
+
+int mpfrw_load(void) {
+    if (mpfr_loaded) return mpfr_lib != NULL;
+    mpfr_loaded = 1;
+
+    // Try common library paths
+    const char *paths[] = {
+        "libmpfr.dylib",                          // macOS rpath
+        "/opt/homebrew/lib/libmpfr.dylib",         // macOS ARM64
+        "/usr/local/lib/libmpfr.dylib",            // macOS x86_64
+        "libmpfr.so",                              // Linux rpath
+        "/usr/lib/x86_64-linux-gnu/libmpfr.so",   // Debian/Ubuntu
+        "/usr/lib/libmpfr.so",                     // Generic Linux
+        NULL
+    };
+
+    for (int i = 0; paths[i]; i++) {
+        mpfr_lib = dlopen(paths[i], RTLD_LAZY);
+        if (mpfr_lib) break;
+    }
+    if (!mpfr_lib) return 0;
+
+    // Resolve all needed symbols
+    p_mpfr_init2   = dlsym(mpfr_lib, "mpfr_init2");
+    p_mpfr_clear   = dlsym(mpfr_lib, "mpfr_clear");
+    p_mpfr_set_str = dlsym(mpfr_lib, "mpfr_set_str");
+    // ... resolve all function pointers
+
+    return (p_mpfr_init2 && p_mpfr_clear && p_mpfr_set_str /* && ... */);
+}
+
+int mpfrw_available(void) {
+    return mpfrw_load();
+}
+```
+
+**On the Mojo side**, the wrapper is the only native dependency:
+
+```mojo
+def sqrt(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
+    if not gmp:
+        return _sqrt_native(x, precision)
+
+    # Check MPFR availability at runtime
+    var available = external_call["mpfrw_available", c_int]()
+    if available == 0:
+        raise Error(
+            "gmp=True requires MPFR. Install: brew install mpfr (macOS) "
+            "or apt install libmpfr-dev (Linux)"
+        )
+    return _sqrt_mpfr(x, precision)
+```
+
+**Build simplification**: With lazy loading, the build command for users becomes:
+
+```bash
+# Only need to link the small wrapper — no -lmpfr or -lgmp needed
+mojo build -Xlinker -L./path/to/wrapper -Xlinker -ldecimo_gmp_wrapper \
+           -Xlinker -rpath -Xlinker ./path/to/wrapper \
+           -o myprogram myprogram.mojo
+```
+
+The wrapper itself is a tiny C file (~200 lines) that can be compiled on any system
+with just a C compiler — no GMP/MPFR headers needed. MPFR is only loaded at runtime
+when `gmp=True` is actually called.
+
+**Summary of approach by timeline**:
+
+| Timeline              | Detection Method             | Binary Requirements                              |
+| --------------------- | ---------------------------- | ------------------------------------------------ |
+| **Now** (Mojo 0.26.2) | C wrapper + `dlopen`/`dlsym` | Link wrapper only; MPFR loaded lazily at runtime |
+| **Future** (DLHandle) | Mojo-native `DLHandle`       | Pure Mojo detection, no C wrapper needed         |
+
+---
+
+## 11. Cross-Platform Considerations
+
+### 11.1 Platform Matrix
+
+| Platform         | GMP Availability  | Wrapper Format | Install Method           | Notes                       |
+| ---------------- | ----------------- | -------------- | ------------------------ | --------------------------- |
+| **macOS ARM64**  | ✓ Homebrew        | `.dylib`       | `brew install gmp`       | **Tested ✓**                |
+| **macOS x86_64** | ✓ Homebrew        | `.dylib`       | `brew install gmp`       | Should work (untested)      |
+| **Linux x86_64** | ✓ System packages | `.so`          | `apt install libgmp-dev` | Expected to work            |
+| **Linux ARM64**  | ✓ System packages | `.so`          | `apt install libgmp-dev` | Expected to work            |
+| **Windows**      | ! Complex         | `.dll`         | MSYS2/vcpkg/MPIR         | Requires MinGW or MPIR fork |
+
+### 11.2 GMP Detection by Platform
+
+```bash
+# macOS
+GMP_PREFIX=$(brew --prefix gmp 2>/dev/null)
+# → /opt/homebrew/Cellar/gmp/6.3.0
+
+# Linux (Debian/Ubuntu)
+dpkg -l libgmp-dev 2>/dev/null | grep -q '^ii'
+# Headers: /usr/include/gmp.h
+# Library: /usr/lib/x86_64-linux-gnu/libgmp.so
+
+# Linux (generic)
+pkg-config --exists gmp && pkg-config --cflags --libs gmp
+```
+
+### 11.3 Platform-Specific Considerations
+
+**macOS**:
+
+- Homebrew puts GMP in `/opt/homebrew/` (ARM64) or `/usr/local/` (x86_64)
+- Must set `@rpath` for `.dylib` loading
+- SIP may restrict library search paths
+
+**Linux**:
+
+- System GMP is typically in standard paths (`/usr/lib/`, `/usr/include/`)
+- No rpath issues with standard paths
+- May need `-lgmp` at end of link flags for some linkers
+
+**Windows** (future):
+
+- GMP does not natively support MSVC
+- Options: MPIR (Windows-friendly GMP fork), MinGW-compiled GMP, or Cygwin
+- Different shared library format (`.dll` + import lib)
+- Mojo on Windows is still early — defer this concern
+
+---
+
+## 12. Risk Analysis
+
+### 12.1 Technical Risks
+
+| Risk                                    | Likelihood | Impact | Mitigation                                                 |
+| --------------------------------------- | ---------- | ------ | ---------------------------------------------------------- |
+| Mojo FFI API changes in future versions | HIGH       | MEDIUM | Isolate FFI calls in single module; version-gate if needed |
+| GMP version incompatibility             | LOW        | LOW    | Target GMP ≥6.0 (stable ABI); C wrapper abstracts details  |
+| Handle exhaustion (4096 limit)          | LOW        | MEDIUM | Add handle recycling; expand pool; document limit          |
+| Memory leaks from uncleared handles     | MEDIUM     | MEDIUM | Add `__del__` integration; RAII pattern in Mojo wrapper    |
+| Base conversion overhead for BigDecimal | HIGH       | MEDIUM | Only use GMP for iterative operations; cache conversions   |
+| Build complexity increase               | MEDIUM     | MEDIUM | Provide clear build scripts; make GMP optional             |
+
+### 12.2 Ecosystem Risks
+
+| Risk                            | Likelihood | Impact | Mitigation                                                |
+| ------------------------------- | ---------- | ------ | --------------------------------------------------------- |
+| Users don't have GMP installed  | HIGH       | LOW    | GMP is optional; native Mojo is always available          |
+| Package distribution complexity | MEDIUM     | MEDIUM | Document build steps; consider bundling prebuilt wrappers |
+| Mojo package system changes     | MEDIUM     | MEDIUM | Keep wrapper as separate build artifact                   |
+
+### 12.3 Performance Risks
+
+| Risk                                     | Description                          | Mitigation                                       |
+| ---------------------------------------- | ------------------------------------ | ------------------------------------------------ |
+| FFI overhead dominates for small numbers | Measured: ~3–5 ns overhead per call  | Size threshold (≥64 words) before using GMP      |
+| String-based conversion bottleneck       | O(n²) for string round-trip          | Use binary word import/export instead            |
+| Unnecessary conversions                  | Converting back and forth repeatedly | Cache GMP handles during multi-step computations |
+| GMP slower for specific operations       | Possible edge cases                  | Benchmark-driven thresholds per operation type   |
+
+---
+
+## 13. Implementation Roadmap
+
+> Priority: **BigDecimal first**, using **MPFR** (not raw `mpz_t`). MPFR provides
+> built-in sqrt, exp, ln, sin, cos, tan, π with correct rounding — so most BigDecimal
+> operations are simple one-call wrappers rather than Newton/Taylor implementations.
+> BigInt integration is deferred to Phase 3.
+
+### Phase 1: Foundation — MPFR C Wrapper & GMPDecimal Struct
+
+**Goal**: Establish the MPFR-based C wrapper with runtime detection, the `GMPDecimal`
+intermediate struct, and prove end-to-end with `sqrt`.
+
+| Step | Task                                                | Detail                                                                                                                                                                                                                                                                                                           | Effort | Deps     |
+| ---- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
+| 1.1  | **Install MPFR**                                    | `brew install mpfr` (macOS). Verify: `/opt/homebrew/lib/libmpfr.dylib` exists, `#include <mpfr.h>` compiles. MPFR depends on GMP (already installed).                                                                                                                                                            | Tiny   | —        |
+| 1.2  | **Extend C wrapper with `mpfr_t` handle pool**      | Add to `gmp_wrapper.c`: (a) `static mpfr_t f_handles[MAX_HANDLES]` pool, (b) `mpfrw_init(prec_bits) → handle`, (c) `mpfrw_clear(h)`, (d) `mpfrw_set_str(h, str, len)` (length-safe, base-10), (e) `mpfrw_get_str(h, digits) → char*`, (f) `mpfrw_available()`. Keep existing `mpz_t` wrappers intact for BigInt. | Medium | 1.1      |
+| 1.3  | **Add MPFR arithmetic wrappers**                    | One-line C wrappers: `mpfrw_add(r,a,b)`, `mpfrw_sub(r,a,b)`, `mpfrw_mul(r,a,b)`, `mpfrw_div(r,a,b)`, `mpfrw_sqrt(r,a)`, `mpfrw_neg(r,a)`, `mpfrw_abs(r,a)`, `mpfrw_cmp(a,b)`. All use `MPFR_RNDN` (round-to-nearest). Each is 1 line: e.g. `mpfr_sqrt(f_handles[r], f_handles[a], MPFR_RNDN)`.                   | Small  | 1.2      |
+| 1.4  | **Add MPFR transcendental wrappers**                | `mpfrw_exp(r,a)`, `mpfrw_log(r,a)`, `mpfrw_sin(r,a)`, `mpfrw_cos(r,a)`, `mpfrw_tan(r,a)`, `mpfrw_const_pi(r)`, `mpfrw_pow(r,a,b)`, `mpfrw_rootn_ui(r,a,n)`. Each is 1 line.                                                                                                                                      | Small  | 1.2      |
+| 1.5  | **Implement lazy `dlopen` loading**                 | Wrap all MPFR calls behind function pointers resolved via `dlopen`/`dlsym` at first use (see Section 10.4). This makes the wrapper compilable without MPFR headers. Add `mpfrw_available() → 0/1`.                                                                                                               | Medium | 1.3, 1.4 |
+| 1.6  | **Compile wrapper, run smoke test**                 | Build `libdecimo_gmp_wrapper.dylib` linking against `-lmpfr -lgmp`. Write minimal Mojo test: `mpfrw_init(200) → set_str("3.14") → mpfrw_sqrt → get_str → print`. Verify output.                                                                                                                                  | Small  | 1.5      |
+| 1.7  | **Create `src/decimo/bigdecimal/gmp_decimal.mojo`** | Implement `GMPDecimal` struct (single `handle: Int32` field). Constructor from BigDecimal via `str(bd) → mpfrw_set_str`. `to_bigdecimal(precision) → mpfrw_get_str → BigDecimal(s)`. Destructor calls `mpfrw_clear`.                                                                                             | Medium | 1.6      |
+| 1.8  | **Wire `gmp=True` into `sqrt()`**                   | In `exponential.mojo`: add `gmp: Bool = False` parameter. If True: `GMPDecimal(x, prec+10) → mpfrw_sqrt → to_bigdecimal(prec)`. Three lines of new code.                                                                                                                                                         | Small  | 1.7      |
+| 1.9  | **Test sqrt correctness**                           | For x ∈ {2, 3, 0.5, 100, 10⁻⁸, 10¹⁰⁰⁰}, P ∈ {28, 100, 500, 1000, 5000}: verify `sqrt(x, P, gmp=True)` matches `sqrt(x, P)` to P digits.                                                                                                                                                                          | Medium | 1.8      |
+| 1.10 | **Benchmark sqrt**                                  | Measure wall time including conversion overhead. Find break-even precision where MPFR wins. Expected: above ~50–100 digits.                                                                                                                                                                                      | Small  | 1.9      |
+| 1.11 | **Update build system**                             | `pixi.toml` tasks: `build-gmp-wrapper` (compiles C wrapper + links MPFR). `build_gmp_wrapper.sh` for macOS + Linux.                                                                                                                                                                                              | Small  | 1.2      |
+
+**Deliverable**: `sqrt(BigDecimal("2"), precision=5000, gmp=True)` returns correct result.
+Internally: 1 string conversion in, 1 MPFR call, 1 string conversion out.
+
+### Phase 2: All BigDecimal Operations
+
+**Goal**: Extend `gmp=True` to all compute-heavy operations. Because MPFR has built-in
+functions for each, every operation is just a **thin wrapper** — no Newton/Taylor needed.
+
+| Step | Task                                      | Detail                                                                                                                                                       | Effort | Deps    |
+| ---- | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ------- |
+| 2.1  | **Wire `true_divide(x, y, P, gmp=True)`** | `GMPDecimal(x) → GMPDecimal(y) → mpfrw_div → to_bigdecimal`.                                                                                                 | Small  | Phase 1 |
+| 2.2  | **Wire `root(x, n, P, gmp=True)`**        | `mpfrw_rootn_ui(r, x_handle, n)` — single MPFR call.                                                                                                         | Small  | Phase 1 |
+| 2.3  | **Wire `exp(x, P, gmp=True)`**            | `mpfrw_exp(r, x_handle)` — single MPFR call.                                                                                                                 | Small  | Phase 1 |
+| 2.4  | **Wire `ln(x, P, gmp=True)`**             | `mpfrw_log(r, x_handle)` — single MPFR call.                                                                                                                 | Small  | Phase 1 |
+| 2.5  | **Wire `sin(x, P, gmp)`, `cos`, `tan`**   | `mpfrw_sin`, `mpfrw_cos`, `mpfrw_tan` — each single call.                                                                                                    | Small  | Phase 1 |
+| 2.6  | **Wire `pi(P, gmp=True)`**                | `mpfrw_const_pi(r)` — single MPFR call. No Chudnovsky needed.                                                                                                | Small  | Phase 1 |
+| 2.7  | **Instance method wrappers**              | Update `self.sqrt()`, `self.exp()`, etc. to accept `gmp: Bool = False`.                                                                                      | Small  | 2.6     |
+| 2.8  | **Comprehensive test suite**              | For each function × {gmp=True, gmp=False}: verify agreement to requested precision. Edge cases: 0, negative, very large/small. Compare against WolframAlpha. | Large  | 2.7     |
+| 2.9  | **Benchmark all operations**              | Table: operation × precision (28, 100, 500, 1000, 5000) × gmp. Include conversion overhead.                                                                  | Medium | 2.8     |
+
+**Note**: Because each MPFR operation is a single C call, Phase 2 is significantly
+simpler than it would be with `mpz_t` (where we'd need to implement Taylor series,
+Chudnovsky, and Newton iteration ourselves in Mojo using GMPDecimal arithmetic).
+
+**Deliverable**: All iterative BigDecimal functions support `gmp=True`.
+
+**Checkpoint test**: `pi(precision=10000, gmp=True)` matches known π digits and runs
+significantly faster than `pi(precision=10000)`.
+
+### Phase 3: BigInt GMP Backend (Deferred)
+
+**Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred because integer
+operations are less user-facing.
+
+| Step | Task                                            | Detail                                                                 | Effort | Deps    |
+| ---- | ----------------------------------------------- | ---------------------------------------------------------------------- | ------ | ------- |
+| 3.1  | **Word import/export in C wrapper**             | Complete `gmpw_import_u32_le` / `gmpw_export_u32_le`. Test round-trip. | Small  | Phase 1 |
+| 3.2  | **Create `src/decimo/bigint/gmp_backend.mojo`** | GMPBigInt struct. Convert BigInt ↔ GMP via O(n) word import/export.    | Medium | 3.1     |
+| 3.3  | **Add `gmp: Bool = False` to BigInt ops**       | On `multiply`, `floor_divide`, `sqrt`, `gcd`, `power` free functions.  | Medium | 3.2     |
+| 3.4  | **Tests and benchmarks**                        | Verify correctness. Confirm 7–18× speedup.                             | Medium | 3.3     |
+
+### Phase 4: Polish, Distribution & Future
+
+| Step | Task                                                                     | Effort | Deps             |
+| ---- | ------------------------------------------------------------------------ | ------ | ---------------- |
+| 4.1  | **Cross-platform wrapper compilation (Linux)**                           | Small  | Phase 1          |
+| 4.2  | **Pre-built wrapper binaries (macOS ARM64, Linux x86)**                  | Medium | 4.1              |
+| 4.3  | **CLI `--gmp` flag**                                                     | Medium | Phase 2          |
+| 4.4  | **GMPDecimal expression chaining** (user-facing)                         | Medium | Phase 2          |
+|      | ↳ `gx = GMPDecimal(x); r = gx.sqrt().multiply(gx.exp()).to_bigdecimal()` |        |                  |
+| 4.5  | **Documentation: user guide for GMP/MPFR integration**                   | Small  | Phase 2          |
+| 4.6  | **Pure-Mojo runtime detection (when DLHandle available)**                | Medium | Mojo improvement |
+| 4.7  | **GMP rational numbers (`mpq_t`) for exact division**                    | Medium | Future           |
+
+### Estimated Effort Summary
+
+| Phase                       | Scope                                          | Steps | Complexity                                    | Priority |
+| --------------------------- | ---------------------------------------------- | ----- | --------------------------------------------- | -------- |
+| Phase 1: Foundation + sqrt  | MPFR wrapper, GMPDecimal, runtime detect, sqrt | 11    | Medium — most work is C wrapper + plumbing    | **HIGH** |
+| Phase 2: All BigDecimal ops | divide, exp, ln, trig, pi, root                | 9     | **Low** — each op is a thin wrapper over MPFR | **HIGH** |
+| Phase 3: BigInt backend     | Word import/export, mpz_t dispatch             | 4     | Medium                                        | Deferred |
+| Phase 4: Polish             | CLI, cross-platform, docs, chaining            | 7     | Varies                                        | Low      |
+
+---
+
+## 14. Appendix: Prototype Results
+
+### 14.1 Files Created
+
+| File                                | Purpose                   | Status                                 |
+| ----------------------------------- | ------------------------- | -------------------------------------- |
+| `temp/gmp/gmp_wrapper.c`            | C wrapper library source  | ✓ Complete                             |
+| `temp/gmp/libgmp_wrapper.dylib`     | Compiled shared library   | ✓ Built                                |
+| `temp/gmp/test_gmp_ffi.mojo`        | FFI test suite (18 tests) | ✓ 14/18 pass (4 wrong expected values) |
+| `temp/gmp/bench_gmp_vs_decimo.mojo` | GMP benchmark suite       | ✓ Complete                             |
+| `temp/gmp/build_and_test.sh`        | Build and test automation | ✓ Complete                             |
+
+### 14.2 C Wrapper API Summary
+
+```c
+// Lifecycle
+int  gmpw_init(void);                              // → handle
+void gmpw_clear(int handle);
+
+// String I/O
+int  gmpw_set_str(int handle, const char* str, int base);
+int  gmpw_get_str_buf(int handle, char* buf, int bufsize, int base);
+
+// Arithmetic
+void gmpw_add(int r, int a, int b);
+void gmpw_sub(int r, int a, int b);
+void gmpw_mul(int r, int a, int b);
+void gmpw_fdiv_q(int r, int a, int b);             // Floor division
+void gmpw_tdiv_q(int r, int a, int b);             // Truncate division
+void gmpw_mod(int r, int a, int b);
+void gmpw_pow_ui(int r, int base, unsigned long exp);
+void gmpw_powm(int r, int base, int exp, int mod); // Modular exponentiation
+void gmpw_sqrt(int r, int a);
+void gmpw_gcd(int r, int a, int b);
+void gmpw_neg(int r, int a);
+void gmpw_abs(int r, int a);
+
+// Comparison
+int  gmpw_cmp(int a, int b);
+int  gmpw_sign(int handle);
+int  gmpw_sizeinbase10(int handle);
+
+// Bitwise
+void gmpw_and(int r, int a, int b);
+void gmpw_or(int r, int a, int b);
+void gmpw_xor(int r, int a, int b);
+void gmpw_lshift(int r, int a, unsigned long bits);
+void gmpw_rshift(int r, int a, unsigned long bits);
+
+// Binary word I/O
+void gmpw_import_u32_le(int handle, const uint32_t* words, int count);
+int  gmpw_export_u32_le(int handle, uint32_t* out, int max_words);
+int  gmpw_export_u32_count(int handle);
+
+// Meta
+int         gmpw_available(void);
+const char* gmpw_version(void);
+```
+
+### 14.3 Sample Mojo FFI Calls
+
+```mojo
+from sys.ffi import external_call, c_int
+
+# ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 15.1)
+def gmp_set_str(h: c_int, s: String):
+    var slen = c_int(len(s))
+    var ptr = Int(s.unsafe_ptr())
+    _ = external_call["gmpw_set_str_n", c_int, c_int, Int, c_int](h, ptr, slen)
+
+# ❌ UNSAFE pattern: DO NOT USE — String may be freed before external_call executes
+# def gmp_set_str(h: c_int, s: String):
+#     _ = external_call["gmpw_set_str", c_int, c_int, Int](h, Int(s.unsafe_ptr()))
+
+def gmp_init() -> c_int:
+    return external_call["gmpw_init", c_int]()
+
+def gmp_clear(h: c_int):
+    external_call["gmpw_clear", NoneType, c_int](h)
+
+def gmp_add(r: c_int, a: c_int, b: c_int):
+    external_call["gmpw_add", NoneType, c_int, c_int, c_int](r, a, b)
+```
+
+### 14.4 Benchmark Data Table
+
+**GMP absolute timings** (per operation, macOS ARM64, Apple M-series):
+
+| Digits  | Multiply | Add    | Divide | Sqrt   | GCD    |
+| ------- | -------- | ------ | ------ | ------ | ------ |
+| 10      | 5 ns     | 7 ns   | —      | 8 ns   | 8 ns   |
+| 100     | 23 ns    | 8 ns   | 67 ns  | 114 ns | 50 ns  |
+| 1,000   | 1 µs     | 20 ns  | 818 ns | 722 ns | 169 ns |
+| 10,000  | 50 µs    | 226 ns | 30 µs  | 21 µs  | 2 µs   |
+| 100,000 | 939 µs   | 2 µs   | 740 µs | 642 µs | 324 µs |
+
+---
+
+## 15. Lessons Learned from Prototype Development
+
+This section documents issues encountered during the GMP prototype build-test cycle
+and their fixes. **Read this before starting implementation** — every item below
+cost significant debugging time and will recur if not handled.
+
+### 15.1 CRITICAL: Mojo String Use-After-Free in FFI Calls
+
+**Symptom**: GMP's `mpz_set_str()` silently fails (returns -1), leaving the mpz value
+as 0. Subsequent operations on zero values trigger `__gmp_overflow_in_mpz` in division
+or crash with corrupt data.
+
+**Root cause**: When you write:
+
+```mojo
+def gmp_set_str(h: c_int, s: String):
+    _ = external_call["gmpw_set_str", c_int, c_int, Int](h, Int(s.unsafe_ptr()))
+```
+
+Mojo's optimizer may drop the `String` parameter `s` (freeing its backing buffer) before
+`external_call` actually reads the pointer. Under heavy allocation pressure (millions of
+BigInt/BigDecimal operations), the freed buffer is immediately reused by the allocator,
+overrwriting the null terminator. GMP then reads past the intended string into adjacent
+memory, encounters non-digit bytes, and returns error code -1.
+
+**How we found it**: The crash only appeared in the full benchmark binary after
+millions of prior allocations, never in small test files. Adding `fprintf(stderr)`
+logging to the C wrapper revealed that:
+
+- `strlen()` on the pointer showed 109 instead of expected 100
+- `mpz_set_str` returned -1 (failure), not 0 (success)
+- The extra bytes were from adjacent String buffers reusing freed memory
+
+**Fix**: Pass the string length explicitly and copy in C:
+
+```c
+// In gmp_wrapper.c
+int gmpw_set_str_n(int h, const char *str, int len) {
+    char *buf = (char *)malloc(len + 1);
+    memcpy(buf, str, len);
+    buf[len] = '\0';
+    int rc = mpz_set_str(handles[h], buf, 10);
+    free(buf);
+    return rc;
+}
+```
+
+```mojo
+# In Mojo
+def gmp_set_str(h: c_int, s: String):
+    var slen = c_int(len(s))    # Extract length BEFORE ptr
+    var ptr = Int(s.unsafe_ptr())
+    _ = external_call["gmpw_set_str_n", c_int, c_int, Int, c_int](h, ptr, slen)
+```
+
+**Rule**: **ALWAYS use `gmpw_set_str_n` with explicit length for any String → C FFI
+call.** Never rely on null-termination of Mojo `String.unsafe_ptr()` in `external_call`.
+
+**Broader implication**: This affects ALL FFI calls that pass Mojo `String` pointers.
+Every such call in the codebase must either:
+
+1. Use the length-aware C variant (recommended), or
+2. Store the String in a local variable AND ensure no other allocation occurs between
+   `unsafe_ptr()` extraction and the `external_call` (fragile, not recommended)
+
+### 15.2 `mojo run` Does Not Support `-Xlinker` Flags
+
+**Symptom**: `mojo run file.mojo -Xlinker -lgmp_wrapper` ignores the linker flags,
+causing "undefined symbol" errors at runtime.
+
+**Cause**: The JIT (`mojo run`) does not process `-Xlinker` flags. Only `mojo build`
+(AOT compilation) supports them.
+
+**Fix**: Always use `mojo build` → run the executable. Never use `mojo run` for
+GMP-linked code.
+
+```bash
+# ✓ Correct
+pixi run mojo build -Xlinker -L... -Xlinker -lgmp_wrapper -o program program.mojo
+./program
+
+# ✗ Wrong — linker flags silently ignored
+pixi run mojo run -Xlinker -L... program.mojo
+```
+
+### 15.3 Mojo 0.26.2 Syntax Differences
+
+Several Mojo syntax changes from earlier versions caused compile errors:
+
+| Feature            | Old Syntax                         | Mojo 0.26.2 Syntax                             |
+| ------------------ | ---------------------------------- | ---------------------------------------------- |
+| Compile-time const | `alias X = 7`                      | `comptime X = 7`                               |
+| String slicing     | `s[:10]`                           | `s[byte=:10]`                                  |
+| UInt → Float64     | `Float64(some_uint)`               | `Float64(Int(some_uint))`                      |
+| UnsafePointer      | `UnsafePointer[Byte](address=ptr)` | `UnsafePointer[Byte](unsafe_from_address=ptr)` |
+
+### 15.4 GMP `__gmp_overflow_in_mpz` Means Bad Input, Not Overflow
+
+Despite the name, `__gmp_overflow_in_mpz` typically means the `mpz_t` struct has
+corrupted metadata (alloc=0, size=0, or invalid limb pointer), NOT that a computation
+result was too large. If you see this error:
+
+1. **First check**: Did `mpz_set_str` fail? (Return value -1 means the string was invalid.)
+2. **Second check**: Was the handle cleared and reused?
+3. **Third check**: Was the handle ever initialized?
+
+In our case it was always (1) — the string pointer was dangling.
+
+### 15.5 Handle Pool Sizing
+
+The C wrapper uses a static array of `MAX_HANDLES = 4096` handles. For Phase 1 (sqrt),
+each Newton iteration typically uses 3–5 temporary handles, so 4096 is plenty. But for
+Phase 2 (complex expression trees), monitor handle usage. Consider:
+
+- Adding `gmpw_handles_in_use()` diagnostic function
+- Expanding to dynamic allocation if 4096 is insufficient
+- Ensuring every code path calls `gmpw_clear()` (RAII via `GMPDecimal.__del__`)
+
+### 15.6 Build Command Template
+
+The full build command for any GMP-linked Mojo file on macOS:
+
+```bash
+pixi run mojo build \
+  -Xlinker -L/path/to/temp/gmp -Xlinker -lgmp_wrapper \
+  -Xlinker -L/opt/homebrew/lib -Xlinker -lgmp \
+  -Xlinker -rpath -Xlinker /path/to/temp/gmp \
+  -Xlinker -rpath -Xlinker /opt/homebrew/lib \
+  -I src \
+  -o output_binary input_file.mojo
+```
+
+Key points:
+
+- `-I src` is needed for `from decimo.xxx import ...` to resolve
+- Both `-L` (link-time) and `-rpath` (run-time) paths needed for each library
+- Build must be run from the project root directory for module resolution
+- The binary must be run from the directory containing `libgmp_wrapper.dylib`
+  (or use absolute rpath)
+
+### 15.7 Debugging Tip: Add `fprintf(stderr)` to C Wrapper
+
+When debugging FFI issues, temporarily add `fprintf(stderr)` logging to the C wrapper
+functions. This is invaluable because Mojo's error reporting for FFI issues is limited.
+The C wrapper can inspect raw pointer values, string contents, and GMP internal state
+(`handles[h]->_mp_alloc`, `_mp_size`, `_mp_d`) that are invisible from the Mojo side.
+
+Always remove debug logging before benchmarking — `fprintf` has measurable overhead.
+
+---
+
+## Summary
+
+GMP/MPFR integration is **technically proven and recommended** for Decimo, with BigDecimal
+as the priority target using **MPFR** (not raw `mpz_t`):
+
+1. **MPFR for BigDecimal** — Uses `mpfr_t` (arbitrary-precision binary float) instead of
+   `mpz_t` + manual scale tracking. Every operation (sqrt, exp, ln, sin, cos, tan, π,
+   divide) is a **single MPFR call**. No Newton iteration or Taylor series implementation
+   needed from the Mojo side. The `GMPDecimal` struct has just one field (`handle`).
+2. **`gmp: Bool = False`** — Simple opt-in parameter on compute-heavy functions. Defaults
+   to native Decimo for backward compatibility. Guard digits handle binary↔decimal rounding.
+3. **Single binary, runtime detection** — The C wrapper uses `dlopen`/`dlsym` to load
+   MPFR lazily at runtime (verified working in Mojo 0.26.2). Users who never call
+   `gmp=True` pay zero cost. If MPFR isn't installed, a clear error message is raised.
+4. **BigInt deferred** to Phase 3 (uses `mpz_t` with word import/export).
+5. **CLI `--gmp` deferred** to Phase 4.
+6. **Phase 2 is trivial** — Because MPFR has built-in functions, wiring each BigDecimal
+   operation is a thin wrapper (~5 lines each), not a reimplementation of algorithms.
+
+The prototype in `temp/gmp/` and the lessons in Section 15 provide a foundation for
+implementation. Start with Phase 1, Step 1.1 (`brew install mpfr`).

From 4704255be3a27f5a99f847aa38597557d344397f Mon Sep 17 00:00:00 2001
From: ZHU Yuhao <dr.yuhao.zhu@outlook.com>
Date: Fri, 3 Apr 2026 22:39:16 +0200
Subject: [PATCH 3/6] Update plan

---
 docs/plans/gmp_integration.md | 915 +++++++++++++++++++++-------------
 1 file changed, 557 insertions(+), 358 deletions(-)

diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md
index 4fbb5c4b..d84b951f 100644
--- a/docs/plans/gmp_integration.md
+++ b/docs/plans/gmp_integration.md
@@ -12,7 +12,7 @@
 
 ### Verdict: FEASIBLE ✓
 
-GMP integration into Decimo is technically feasible and has been proven through a working prototype. A C wrapper approach (`gmp_wrapper.c`) successfully bridges Mojo's FFI layer to GMP, with all core arithmetic operations verified.
+GMP integration into Decimo is technically feasible — I've proven it with a working prototype. A C wrapper (`gmp_wrapper.c`) bridges Mojo's FFI layer to GMP, and all core arithmetic operations pass.
 
 ### Key Findings
 
@@ -29,24 +29,31 @@ GMP integration into Decimo is technically feasible and has been proven through
 
 ### Recommendation
 
-Integrate GMP as an **optional backend for BigInt** (base-2^32) first, where data conversion
-is cheap and measured speedups are 7–18× across operations. BigDecimal integration should
-follow only for operations where GMP's speed advantage (15–353× in benchmarks) outweighs
-the base-10⁹ ↔ binary conversion cost — primarily large multiplications, sqrt, and GCD at
-1,000+ digits.
+Two-type architecture:
 
----
+1. **BigFloat** — a new first-class MPFR-backed binary float type (like mpmath's `mpf`
+   but faster). Requires MPFR. Every operation is a single MPFR call. This is the fast
+   path for scientific computing at high precision.
+2. **BigDecimal** — keeps its native base-10⁹ implementation. Gains an optional
+   `gmp=True` parameter for one-off acceleration of expensive operations (sqrt, exp, ln,
+   etc.) without changing types. No MPFR dependency.
+3. **BigInt** — GMP backend deferred to Phase 3. Cheap O(n) word conversion, 7–18×
+   speedup measured.
+
+BigFloat and BigDecimal coexist because they solve different problems: BigFloat is fast
+binary arithmetic (like mpmath); BigDecimal is exact decimal arithmetic (like Python
+`decimal`). Both in one library, user picks the right tool.
 
 ## 2. Technical Feasibility Assessment
 
 ### 2.1 What Was Tested
 
-A complete prototype was built in `temp/gmp/` consisting of:
+A complete prototype was built in `temp/gmp/`:
 
 1. **C wrapper** (`gmp_wrapper.c`): Handle-based API wrapping GMP's `mpz_t` operations
-2. **Shared library** (`libgmp_wrapper.dylib`): Compiled C wrapper linked against GMP
+2. **Shared library** (`libgmp_wrapper.dylib`): Compiled, linked against GMP
 3. **Mojo FFI test** (`test_gmp_ffi.mojo`): 18 tests covering all operation categories
-4. **Mojo benchmark** (`bench_gmp_vs_decimo.mojo`): Performance characterization across digit sizes
+4. **Mojo benchmark** (`bench_gmp_vs_decimo.mojo`): Performance comparison across digit sizes
 
 ### 2.2 Prototype Results
 
@@ -70,16 +77,15 @@ A complete prototype was built in `temp/gmp/` consisting of:
 
 ### 2.3 Why C Wrapper Is Required
 
-Direct FFI to GMP is impractical in Mojo 0.26.2 because:
+Direct FFI to GMP doesn't work in Mojo 0.26.2 because:
 
 1. **No `DLHandle`**: Mojo 0.26.2 does not expose `DLHandle` for runtime dynamic library loading. Only `external_call` (compile-time/link-time binding) is available.
 2. **`mpz_t` is a struct**: GMP's `mpz_t` is `__mpz_struct[1]` (a 1-element array used as a pass-by-reference trick). Mojo cannot directly represent or pass this.
 3. **Pointer origin tracking**: `UnsafePointer` in Mojo 0.26.2 has strict origin tracking that prevents constructing pointers from raw integer addresses in non-`fn` contexts.
 4. **GMP macros**: Many GMP operations (e.g., `mpz_init`, `mpz_clear`) are preprocessor macros, not true function symbols.
 
-The C wrapper solves all these issues by exposing a flat, handle-based C API where Mojo passes and receives only primitive types (`Int32`, `Int`).
-
----
+The C wrapper solves all of this by exposing a flat, handle-based C API where Mojo
+passes and receives only primitive types (`Int32`, `Int`).
 
 ## 3. Mojo FFI Capabilities and Constraints
 
@@ -115,8 +121,6 @@ var result = external_call["gmpw_add", NoneType, Int32, Int32, Int32](a, b, c)
 - **Global variables**: Would allow module-level GMP backend state
 - **Conditional compilation (`@static_if`)**: Would enable clean feature toggling
 
----
-
 ## 4. Architecture Analysis
 
 ### 4.1 Current Decimo Architecture
@@ -142,31 +146,30 @@ mpf_t                →      Binary floating-point with configurable precision
 
 ### 4.3 Type Mapping
 
-| Decimo Type    | GMP Equivalent                      | Conversion Cost                                       | Integration Priority |
-| -------------- | ----------------------------------- | ----------------------------------------------------- | -------------------- |
-| **BigInt**     | `mpz_t`                             | **Low** — same binary base (2^32 → 2^64 word packing) | **Phase 1** (HIGH)   |
-| **BigUInt**    | `mpz_t` (unsigned subset)           | **High** — base-10^9 → binary conversion              | Phase 3 (LOW)        |
-| **BigDecimal** | `mpf_t` or `mpz_t` + manual scaling | **High** — base conversion + scale management         | Phase 2 (MEDIUM)     |
-| **Dec128**     | Not applicable                      | N/A — fixed precision, already fast                   | Skip                 |
+| Decimo Type          | GMP Equivalent                  | Conversion Cost                                       | Integration Priority |
+| -------------------- | ------------------------------- | ----------------------------------------------------- | -------------------- |
+| **BigFloat** *(new)* | `mpfr_t`                        | **None** — direct MPFR handle                         | **Phase 1** (HIGH)   |
+| **BigInt**           | `mpz_t`                         | **Low** — same binary base (2^32 → 2^64 word packing) | **Phase 3** (MEDIUM) |
+| **BigUInt**          | `mpz_t` (unsigned subset)       | **High** — base-10^9 → binary conversion              | Skip                 |
+| **BigDecimal**       | via BigFloat (`gmp=True` sugar) | **High** — string conversion, but single round-trip   | **Phase 2** (HIGH)   |
+| **Dec128**           | Not applicable                  | N/A — fixed precision, already fast                   | Skip                 |
 
-### 4.4 Where GMP Adds Value
+### 4.4 Where GMP Helps
 
-GMP's advantages over Decimo's native implementation:
+GMP's advantages over my native implementation:
 
 1. **Highly optimized assembly**: GMP includes hand-tuned ARM64 assembly for core operations
 2. **Mature algorithm suite**: Decades of optimization (Schönhage-Strassen for huge multiplications, sub-quadratic GCD, etc.)
 3. **Large-number regime**: GMP's algorithmic cutoffs are well-tuned for every platform
 4. **Additional algorithms**: FFT-based multiplication, extended GCD, primality testing, etc.
 
-Where GMP does **not** add value:
+Where GMP does **not** help:
 
 1. **Small numbers** (<50 digits): FFI overhead dominates
 2. **Dec128**: Fixed 128-bit operations are already optimal with native code
 3. **Base-10 operations**: GMP works in binary; base-10 output requires conversion
 4. **Decimal arithmetic**: GMP has no native decimal type; scale management would remain in Mojo
 
----
-
 ## 5. Performance Comparison: GMP vs Decimo
 
 All benchmarks run side-by-side in a single binary (`bench_gmp_vs_decimo.mojo`), compiled
@@ -249,7 +252,7 @@ Decimo currently uses the Euclidean algorithm. This is the smallest gap of all o
 
 ### 5.2 BigDecimal (base-10⁹) vs GMP Integer Equivalents
 
-GMP has no native decimal type. To compare fairly, we benchmark GMP performing the
+GMP has no native decimal type. To compare fairly, I benchmark GMP performing the
 **equivalent integer computation** that a GMP-backed BigDecimal would perform:
 
 - **Multiply**: GMP integer multiply on same-digit-count coefficients
@@ -257,7 +260,7 @@ GMP has no native decimal type. To compare fairly, we benchmark GMP performing t
 - **Division (1/3) at precision P**: GMP `(1 × 10^P) / 3` vs Decimo `BigDecimal.true_divide(…, precision=P)`
 
 These ratios represent the **best-case speedup** from a GMP backend, before
-accounting for base-10⁹ ↔ binary conversion overhead.
+accounting for the base-10⁹ ↔ binary conversion overhead.
 
 #### BigDecimal Multiplication (n sig-digits × n sig-digits)
 
@@ -325,7 +328,7 @@ The C wrapper call path has fixed overhead:
 2. Handle array lookup: ~1–2 ns
 3. Total fixed overhead: **~3–5 ns per call**
 
-This means:
+In practice:
 
 - At **10 digits**: overhead is 30–100% of operation time → **GMP not beneficial**
 - At **100 digits**: overhead is 5–10% → **GMP moderately beneficial**
@@ -334,21 +337,19 @@ This means:
 **Recommended crossover threshold**: Use GMP backend for numbers with **≥64 words** (~600
 digits for BigInt). Below this, Decimo's native implementation avoids FFI overhead.
 
-### 5.5 Technical Note: Mojo String FFI Safety
+### 5.5 Mojo String FFI Safety Issue
 
-During benchmarking, we discovered a Mojo compiler issue where the optimizer may drop a
+During benchmarking, I hit a Mojo compiler issue where the optimizer may drop a
 `String` parameter's backing buffer before the `external_call` using its `unsafe_ptr()`
 completes. Under heavy allocation pressure, the freed buffer gets reused immediately,
 causing GMP's `mpz_set_str` to read corrupted data.
 
-**Workaround**: A length-aware C function `gmpw_set_str_n(handle, ptr, len)` was added that
+**Workaround**: I added a length-aware C function `gmpw_set_str_n(handle, ptr, len)` that
 copies exactly `len` bytes into a null-terminated buffer before calling GMP, making the
 FFI call safe regardless of the caller's buffer lifetime. Any production integration must
 use the length-based variant (`gmpw_set_str_n`) or ensure String buffers remain live
 through explicit variable binding.
 
----
-
 ## 6. Operation Coverage Mapping
 
 ### 6.1 BigInt Operations → GMP Functions
@@ -406,13 +407,11 @@ These must remain native Mojo:
 - CLI calculator evaluation
 - Dec128 operations (fixed-precision)
 
----
-
 ## 7. Data Conversion Between Decimo and GMP
 
 ### 7.1 BigInt ↔ GMP (Efficient)
 
-BigInt uses base-2^32 `List[UInt32]` in little-endian order — essentially the same representation as GMP's `mpz_t` but with 32-bit limbs instead of 64-bit.
+BigInt uses base-2^32 `List[UInt32]` in little-endian order — basically the same thing as GMP's `mpz_t` but with 32-bit limbs instead of 64-bit.
 
 **Conversion approach**:
 
@@ -453,7 +452,7 @@ int  gmpw_export_u32_le(int handle, uint32_t* out_buf, int max_count);
 
 ### 7.2 BigUInt ↔ GMP (Expensive)
 
-BigUInt uses base-10^9 — a fundamentally different number base than GMP's binary format.
+BigUInt uses base-10^9 — a completely different number base from GMP's binary format.
 
 **Conversion options**:
 
@@ -468,7 +467,7 @@ BigUInt uses base-10^9 — a fundamentally different number base than GMP's bina
 - For <1000 words: Horner's method (simpler, O(n²) acceptable)
 - For ≥1000 words: Divide-and-conquer using GMP's own multiply for combining
 
-**Break-even analysis**: If we assume base conversion costs ~O(n²) for n words:
+**Break-even analysis**: Assuming base conversion costs ~O(n²) for n words:
 
 - At 100 words (~900 digits): conversion ~10 µs, GMP multiply saves ~4 µs → **NOT worth it**
 - At 1000 words (~9000 digits): conversion ~1 ms, GMP multiply saves ~150 µs → **NOT worth it for single operation**
@@ -495,13 +494,11 @@ BigUInt uses base-10^9 — a fundamentally different number base than GMP's bina
                     └──────────────────────────────────────┘
 ```
 
----
-
 ## 8. GMP Detection and Fallback Strategy
 
 ### 8.1 Detection Approaches
 
-Since Mojo 0.26.2 lacks `DLHandle` for runtime library probing, detection must be **build-time** or **semi-static**:
+Since Mojo 0.26.2 doesn't have `DLHandle` for runtime library probing, detection has to be **build-time** or **semi-static**:
 
 #### Option A: Compile-Time Feature Flag (Recommended for Now)
 
@@ -578,229 +575,274 @@ User calls BigInt.multiply(a, b):
      → Use native Mojo implementation (current code)
 ```
 
----
+## 9. Type Design: BigFloat and BigDecimal
+
+Decimo gets two arbitrary-precision types with different backends and semantics.
 
-## 9. API Design: `gmp: Bool = False` Parameter
+### 9.1 Two Types, Two Number Systems
 
-### 9.1 Design Rationale
+|                    | **BigDecimal**                   | **BigFloat**                                       |
+| ------------------ | -------------------------------- | -------------------------------------------------- |
+| Internal base      | 10⁹ (decimal)                    | binary (MPFR `mpfr_t`)                             |
+| Python analogue    | `decimal.Decimal`                | `mpmath.mpf`                                       |
+| `sqrt(0.01)`       | `0.1` (exact)                    | `0.1000...0` (padded)                              |
+| Dependency         | None                             | MPFR required                                      |
+| Speed at high prec | Native Mojo                      | 10–400× faster (MPFR)                              |
+| Precision unit     | Decimal digits                   | Bits (user specifies digits, converted internally) |
+| Rounding           | Decimal-exact where possible     | Correctly rounded per IEEE binary semantics        |
+| Primary use case   | Finance, human-friendly decimals | Scientific computing, transcendentals              |
 
-Since Mojo lacks global variables, a module-level context or engine setting is not
-possible. Instead, we add a **`gmp: Bool = False`** keyword parameter to compute-heavy
-BigDecimal functions. This is:
+They coexist in the same library. User picks the right tool for the job.
 
-- **Explicit** — users opt in per-call; no hidden magic
-- **Backward-compatible** — defaults to `False`, existing code unchanged
-- **Composable** — users pass `gmp=True` down through complex expressions like π calculation
-- **Simple** — a Bool is clearer than an engine enum for two choices
+### 9.2 Why Two Types Instead of a Backend Swap
 
-### 9.2 Which Functions Get the `gmp` Parameter
+`mpmath` pulls off a transparent backend swap (`MPZ = gmpy.mpz or int`) because:
 
-Only **iterative compute-heavy** operations — not add/sub/multiply (which are building blocks
-called internally and whose overhead is handled by the GMPDecimal struct).
+- Python is dynamically typed — the swap is invisible at runtime
+- `mpmath` already uses binary representation internally — swapping `int` for `mpz_t`
+  doesn't change semantics
 
-| Function                                   | Gets `gmp`? | Reason                                         |
-| ------------------------------------------ | ----------- | ---------------------------------------------- |
-| `true_divide(x, y, precision, gmp)`        | ✓ Yes       | Long division at high precision benefits       |
-| `sqrt(x, precision, gmp)`                  | ✓ Yes       | Newton iteration: 86–353× gap                  |
-| `root(x, n, precision, gmp)`               | ✓ Yes       | Newton iteration like sqrt                     |
-| `exp(x, precision, gmp)`                   | ✓ Yes       | Taylor series with many multiplications        |
-| `ln(x, precision, gmp)`                    | ✓ Yes       | Atanh-based approach with iterative multiplies |
-| `sin(x, precision, gmp)`                   | ✓ Yes       | Taylor series                                  |
-| `cos(x, precision, gmp)`                   | ✓ Yes       | Taylor series                                  |
-| `tan(x, precision, gmp)`                   | ✓ Yes       | Uses sin/cos                                   |
-| `pi(precision, gmp)`                       | ✓ Yes       | Chudnovsky binary splitting: many large muls   |
-| `BigDecimal.__add__`, `__sub__`, `__mul__` | ✗ No        | Handled internally by GMPDecimal               |
-| `BigDecimal.__floordiv__`, `round`         | ✗ No        | Digit manipulation, no heavy arithmetic        |
+I can't do this in Decimo because:
 
-### 9.3 Proposed Signatures
+1. **Mojo is statically typed** — users must choose their type at compile time.
+2. **BigDecimal uses base-10⁹** — swapping BigUInt for `mpz_t` would change decimal
+   semantics (e.g., `sqrt(0.01)` behavior).
+3. **They solve different problems** — BigDecimal is for exact decimal arithmetic;
+   BigFloat is for fast scientific computation.
+
+Python itself acknowledges this split: `decimal.Decimal` and `mpmath.mpf` are separate
+packages. I'm combining both under one import.
+
+### 9.3 BigFloat: MPFR-Backed Binary Float
 
 ```mojo
-# Free functions (primary API)
-def true_divide(x: BigDecimal, y: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def sqrt(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def root(x: BigDecimal, n: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def exp(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def ln(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def sin(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def cos(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def tan(x: BigDecimal, precision: Int, gmp: Bool = False) raises -> BigDecimal:
-def pi(precision: Int, gmp: Bool = False) raises -> BigDecimal:
-
-# Instance methods (delegate to free functions)
-def sqrt(self, precision: Int = PRECISION, gmp: Bool = False) raises -> Self:
-    return exponential.sqrt(self, precision, gmp)
+struct BigFloat:
+    """Arbitrary-precision binary floating-point, backed by MPFR.
+
+    Like mpmath's mpf but faster — every operation is a single MPFR call.
+    Requires MPFR to be installed. If MPFR is unavailable, construction raises.
+    """
+    var handle: Int32  # MPFR handle (via C wrapper handle pool)
 ```
 
+**BigFloat requires MPFR. No fallback.**
+
+I considered a pure-Mojo BigInt-based fallback (like mpmath's pure-Python mode), but:
+
+- mpmath's fallback arithmetic is ~30K lines of Python for the math functions alone
+  (Taylor series, argument reduction, binary splitting, etc.)
+- Reimplementing all that in Mojo defeats the purpose — BigFloat's value proposition
+  IS speed via MPFR
+- If MPFR isn't installed, users have BigDecimal — it already does sqrt, exp, ln, sin,
+  cos, tan, pi, all in pure Mojo. Just slower at high precision.
+- A half-working BigFloat (add/sub/mul work but sqrt/exp/ln don't) would be confusing
+
+So: BigFloat without MPFR is like numpy without its C core. Users without MPFR use
+BigDecimal. This is an honest, clean boundary.
+
 ### 9.4 Why MPFR, Not `mpz_t` with Manual Scale
 
-GMP provides three numeric types:
+GMP has three numeric types:
 
-| Type     | What it stores                                      | Mojo analogue | Use for BigDecimal?                                             |
-| -------- | --------------------------------------------------- | ------------- | --------------------------------------------------------------- |
-| `mpz_t`  | Arbitrary-precision **integer**                     | BigInt        | ✗ Poor — must manually track scale, sign, scale-up for division |
-| `mpf_t`  | Arbitrary-precision **binary float**                | (none)        | △ Works but legacy, no correct rounding                         |
-| `mpfr_t` | Arbitrary-precision **binary float** (MPFR library) | BigDecimal    | **✓ Best** — correctly rounded, built-in exp/ln/sin/cos/π       |
+| Type     | What it stores                                      | Mojo analogue | Use for BigFloat?                                                  |
+| -------- | --------------------------------------------------- | ------------- | ------------------------------------------------------------------ |
+| `mpz_t`  | Arbitrary-precision **integer**                     | BigInt        | ✗ Poor — must manually track scale, sign, implement all algorithms |
+| `mpf_t`  | Arbitrary-precision **binary float** (GMP built-in) | —             | △ Works but legacy, no correct rounding                            |
+| `mpfr_t` | Arbitrary-precision **binary float** (MPFR library) | **BigFloat**  | **✓ Best** — correctly rounded, built-in exp/ln/sin/cos/π          |
 
-If we used `mpz_t` (integer only), we'd need to:
+With `mpz_t`, I'd have to:
 
-- Track `scale` and `sign` separately in GMPDecimal (3 fields)
-- Manually scale up the numerator by 10^N before integer division
-- Implement Newton iterations for sqrt, Taylor series for exp/ln/trig — **ourselves**
+- Store `(sign, mantissa: mpz_t, exponent: Int)` and manage scale manually
+- Scale up numerators by 10^N before integer division
+- Implement Newton iterations for sqrt, Taylor series for exp/ln/trig — **myself**
 - Handle scale alignment for addition/subtraction
+- That's basically reimplementing mpmath in Mojo (~30K lines)
 
-With **MPFR** (`mpfr_t`), all of this is handled internally:
+With **MPFR**, all of this is handled internally:
 
 - Division: `mpfr_div(r, a, b, MPFR_RNDN)` — one call
 - Sqrt: `mpfr_sqrt(r, a, MPFR_RNDN)` — one call
-- Exp: `mpfr_exp(r, a, MPFR_RNDN)` — one call, correctly rounded
-- Ln: `mpfr_log(r, a, MPFR_RNDN)` — one call
+- Exp/Ln: `mpfr_exp`, `mpfr_log` — one call each, correctly rounded
 - Sin/Cos/Tan: `mpfr_sin`, `mpfr_cos`, `mpfr_tan` — one call each
 - π: `mpfr_const_pi(r, MPFR_RNDN)` — one call, arbitrary precision
 
-**MPFR** is a separate library built on top of GMP. It is the standard for
-arbitrary-precision floating-point and is what Python's `mpmath` (via `gmpy2`)
-uses under the hood. It is widely available: `brew install mpfr` (macOS),
-`apt install libmpfr-dev` (Linux).
+MPFR sits on top of GMP and is the industry standard for arbitrary-precision
+floating-point. Easy to get: `brew install mpfr` (macOS), `apt install libmpfr-dev`
+(Linux).
 
-### 9.5 The GMPDecimal Intermediate Struct (MPFR-Based)
-
-Because MPFR natively handles sign, scale, and precision, the struct simplifies
-to **a single field** — the MPFR handle:
+### 9.5 BigFloat API
 
 ```mojo
-struct GMPDecimal:
-    """A decimal value held in MPFR floating-point representation.
-    
-    Wraps a single mpfr_t handle. All arithmetic, sqrt, exp, ln, trig
-    are single MPFR calls. Convert to/from BigDecimal only at entry/exit.
-    """
-    var handle: Int32      # MPFR handle (via C wrapper)
-
-    def __init__(out self, bd: BigDecimal, precision: Int):
-        """Convert BigDecimal → GMPDecimal.
-        
-        precision: decimal digits wanted → converted to bits internally.
-        """
-        # 1 decimal digit ≈ 3.322 bits; add guard bits
-        var bits = precision * 4 + 64
-        self.handle = mpfrw_init(bits)
-        # Pass full decimal string (e.g. "-3.14159") directly
-        mpfrw_set_str(self.handle, str(bd))
-
-    def to_bigdecimal(self, precision: Int) -> BigDecimal:
-        """Convert GMPDecimal → BigDecimal.
-        
-        Exports MPFR value as decimal string, truncated to `precision` digits.
-        """
-        var s = mpfrw_get_str(self.handle, precision)
-        return BigDecimal(s)
-
-    # Arithmetic: each is a single MPFR call
-    def multiply(self, other: GMPDecimal) -> GMPDecimal: ...
-    def add(self, other: GMPDecimal) -> GMPDecimal: ...
-    def divide(self, other: GMPDecimal) -> GMPDecimal: ...
-    def sqrt(self) -> GMPDecimal: ...
-    def exp(self) -> GMPDecimal: ...
-    def ln(self) -> GMPDecimal: ...
-    def sin(self) -> GMPDecimal: ...
-    def cos(self) -> GMPDecimal: ...
-    def pi(precision: Int) -> GMPDecimal: ...  # static
-
-    def __del__(owned self):
-        """RAII: free MPFR handle on destruction."""
-        mpfrw_clear(self.handle)
+# Construction
+var x = BigFloat("3.14159", precision=1000)   # from string
+var y = BigFloat(bd, precision=1000)           # from BigDecimal
+var z = BigFloat.pi(precision=5000)            # constant
+
+# Arithmetic (returns BigFloat)
+var r = x + y
+var r = x * y
+var r = x / y
+
+# Transcendentals (single MPFR call each)
+var r = x.sqrt()
+var r = x.exp()
+var r = x.ln()
+var r = x.sin()
+var r = x.cos()
+var r = x.tan()
+var r = x.root(n)
+var r = x.power(y)
+
+# Conversion back to BigDecimal
+var bd = r.to_bigdecimal(precision=1000)
+
+# Chained computation (stays in MPFR the whole time)
+var result = (
+    BigFloat("23", P).sqrt()
+    .multiply(BigFloat("2.1", P).exp())
+    .divide(BigFloat.pi(P))
+    .to_bigdecimal(P)
+)
+# Cost: 2 string→MPFR conversions in, 5 MPFR calls, 1 MPFR→string out
+# No intermediate decimal conversions
 ```
 
-Compare the old vs. new struct:
+Precision is specified in decimal digits but converted to bits internally:
+`bits = ceil(digits × 3.322) + guard_bits`. Guard bits (64 extra) ensure the
+requested decimal digits are all correct after binary→decimal conversion.
 
-| Aspect          | Old (mpz_t + manual scale)                  | New (MPFR)                    |
-| --------------- | ------------------------------------------- | ----------------------------- |
-| Fields          | `handle`, `scale`, `sign` (3 fields)        | `handle` (1 field)            |
-| Division        | Scale up by 10^N, `mpz_tdiv_q`, track scale | `mpfr_div(r, a, b, RNDN)`     |
-| Sqrt            | Newton iteration in Mojo using GMPDecimal   | `mpfr_sqrt(r, a, RNDN)`       |
-| Exp             | Taylor series in Mojo using GMPDecimal      | `mpfr_exp(r, a, RNDN)`        |
-| Ln              | Atanh series in Mojo using GMPDecimal       | `mpfr_log(r, a, RNDN)`        |
-| Sin/Cos/Tan     | Taylor series in Mojo                       | `mpfr_sin` / `mpfr_cos` / ... |
-| π               | Chudnovsky in Mojo using GMPDecimal         | `mpfr_const_pi(r, RNDN)`      |
-| Sign handling   | Manual XOR                                  | Automatic                     |
-| Scale alignment | Manual multiply by 10^diff                  | Automatic                     |
-| Rounding        | Manual truncation                           | Correctly rounded (MPFR)      |
+### 9.6 BigDecimal `gmp=True` Convenience
 
-**Data flow for `gmp=True`**:
+BigDecimal keeps an optional `gmp=True` parameter for **one-off acceleration**
+without changing types. Internally it does `BigDecimal → BigFloat → compute → BigDecimal`:
 
-```txt
-sqrt(x: BigDecimal, precision=1000, gmp=True):
-  1. gx = GMPDecimal(x, precision=1010)     ← ONE conversion: string → mpfr_t
-  2. mpfr_sqrt(result, gx, MPFR_RNDN)      ← ONE GMP call (entire computation)
-  3. return result.to_bigdecimal(1000)      ← ONE conversion: mpfr_t → string
+```mojo
+# Single-call sugar (user stays in BigDecimal world)
+var r = sqrt(x, precision=5000, gmp=True)
+var r = exp(x, precision=5000, gmp=True)
+var r = pi(precision=10000, gmp=True)
+
+# Equivalent to:
+var r = BigFloat(x, 5010).sqrt().to_bigdecimal(5000)
 ```
 
-For a chained expression like `sqrt(23) * exp(2.1) / pi()`:
+Only **iterative compute-heavy** operations get the `gmp` parameter:
 
-```txt
-var a = GMPDecimal(BigDecimal("23"), prec)    ← 1 conversion
-var b = GMPDecimal(BigDecimal("2.1"), prec)   ← 1 conversion
-var sa = a.sqrt()                              ← mpfr_sqrt (one call)
-var eb = b.exp()                               ← mpfr_exp (one call)
-var p  = GMPDecimal.pi(prec)                   ← mpfr_const_pi (one call)
-var r  = sa.multiply(eb).divide(p)             ← mpfr_mul + mpfr_div
-return r.to_bigdecimal(precision)              ← 1 conversion
-```
+| Function                        | Gets `gmp`? | Reason                                  |
+| ------------------------------- | ----------- | --------------------------------------- |
+| `sqrt`, `root`                  | ✓ Yes       | Newton iteration → single MPFR call     |
+| `exp`, `ln`                     | ✓ Yes       | Taylor/atanh series → single MPFR call  |
+| `sin`, `cos`, `tan`             | ✓ Yes       | Taylor series → single MPFR call        |
+| `pi`                            | ✓ Yes       | Chudnovsky → `mpfr_const_pi`            |
+| `true_divide`                   | ✓ Yes       | Long division at high precision         |
+| `__add__`, `__sub__`, `__mul__` | ✗ No        | Too cheap — FFI overhead not worth it   |
+| `round`, `__floordiv__`         | ✗ No        | Digit manipulation, no heavy arithmetic |
 
-Total: 3 conversions (2 in + 1 out) + 5 MPFR calls. No Newton iterations,
-no Taylor series, no manual scale tracking.
+**When to use `gmp=True` vs BigFloat directly:**
 
-### 9.6 Behavioral Differences Between `gmp=False` and `gmp=True`
+- Single operation on BigDecimal inputs? → `gmp=True` (one liner, stays in BigDecimal)
+- Chain of operations? → Use BigFloat directly (avoids repeated conversion)
 
-**Important**: MPFR is a **binary** floating-point internally (mantissa stored in base-2).
-This means:
+### 9.7 Behavioral Differences: Binary vs. Decimal
 
-1. **Precision is in bits, not digits** — We convert: `bits = ceil(digits × 3.322)`. To
-   guarantee N correct decimal digits, we request ~N×4 bits (generous guard).
+MPFR is a **binary** float internally. What this means:
 
-2. **Last 1–2 decimal digits may differ** — Binary↔decimal rounding means the last few
-   digits of `gmp=True` output may not match `gmp=False` output exactly. The numerical
-   value will agree to the requested precision.
+1. **Last 1–2 decimal digits may differ** between `gmp=True` and `gmp=False`. The
+   numerical value agrees to the requested precision, but the least significant digits
+   can differ due to binary↔decimal rounding.
 
-3. **Fix: guard digits** — Compute internally with `precision + 10` extra digits, then
-   truncate the result to the requested precision. This is standard practice (Python's
-   `decimal` module does the same).
+2. **Guard digits handle this** — I compute with `precision + 10` extra digits, then
+   truncate. Standard practice (Python's `decimal` module does the same).
 
-4. **MPFR guarantees correct rounding** — Every operation is correctly rounded to the
-   working precision. This is actually *stronger* than Decimo's native implementation
+3. **MPFR guarantees correct rounding** — every operation is correctly rounded to the
+   working binary precision. Actually *stronger* than Decimo's native implementation
    which may accumulate rounding errors across Newton iterations.
 
-This is why `gmp` defaults to `False` — so existing behavior is preserved and users
-explicitly opt into the MPFR path when they want speed. In practice, the numerical
-results from MPFR are at least as accurate as the native implementation.
+4. **Semantic difference with BigFloat** — `BigFloat("0.01").sqrt()` returns
+   `0.1000000...0` (binary representation of 0.1). `BigDecimal("0.01").sqrt()` returns
+   exactly `0.1`. For scientific computing this doesn't matter. For financial/display
+   work, use BigDecimal.
 
-### 9.6 CLI Integration (Deferred)
+### 9.8 When to Use Which
 
-CLI integration is deferred to a later phase. When implemented, a `--gmp` flag on the
-command line would set `gmp=True` for all compute-heavy operations in the session.
-The evaluator would pass this flag down through the function dispatch.
+| Scenario                                 | Use                      | Why                                               |
+| ---------------------------------------- | ------------------------ | ------------------------------------------------- |
+| Financial calculations                   | BigDecimal               | Exact decimal representation, no binary surprises |
+| Display-friendly output                  | BigDecimal               | `0.1` stays `0.1`, not `0.10000...00001`          |
+| Fast transcendentals (exp, ln, sin, cos) | BigFloat                 | Single MPFR call, 10–353× faster                  |
+| Scientific computing                     | BigFloat                 | MPFR's correctly-rounded arithmetic               |
+| Chained high-precision computation       | BigFloat                 | Stays in MPFR, no intermediate conversions        |
+| One-off speedup without changing types   | `gmp=True` on BigDecimal | Drop-in acceleration, backward compatible         |
+| No MPFR installed                        | BigDecimal               | Works everywhere, pure Mojo                       |
+| Quick calculation under ~100 digits      | BigDecimal               | FFI overhead not worth it at low precision        |
 
-### 9.7 Future: Global Variable or Context
+### 9.9 Future: Lazy Evaluation for BigDecimal
 
-When Mojo adds global variables, the `gmp` parameter can be replaced or supplemented by
-a module-level default:
+The `gmp=True` per-call approach has a weakness: chained operations convert back and
+forth between BigDecimal and MPFR at every step. A future `lazy()` API fixes this:
 
 ```mojo
-# Future (when Mojo supports global variables)
-BigDecimal.set_default_engine("gmp")  # Sets global default
-var result = sqrt(x, 1000)            # Automatically uses GMP
+# Future API
+var result = (
+    BigDecimal("23").lazy()
+    .sqrt(5000)
+    .multiply(BigDecimal("2.1").lazy().exp(5000))
+    .divide(BigDecimal.pi(5000))
+    .collect()       # executes the whole chain
+)
 ```
 
-The `gmp` parameter will remain as an explicit override even after global variables
-are available, for cases where the user wants fine-grained control.
+How it would work:
+
+1. `.lazy()` returns a `LazyDecimal` that **records operations** instead of evaluating
+2. `.collect()` analyzes the expression tree and picks the best execution strategy
+3. If MPFR available: convert leaf inputs to MPFR **once**, execute entire DAG in MPFR,
+   convert root output back **once** — zero intermediate conversions
+4. If MPFR unavailable (or `.collect(gmp=False)`): execute natively in BigDecimal
+
+This is the same pattern as:
+
+- **Polars** — lazy DataFrames, `.collect()` triggers execution
+- **Spark** — query planning, transformations are lazy until action
+- **C++ Eigen** — expression templates, deferred evaluation until assignment
+- **TensorFlow 1.x** — computation graphs, `session.run()` triggers execution
+
+The key insight: users write BigDecimal code (familiar API), the engine figures out the
+fastest execution path. This goes beyond the roadmap below but is architecturally clean
+because BigFloat and the MPFR wrapper already provide the fast backend.
 
----
+### 9.10 CLI Integration
+
+With BigFloat as a first-class type, the CLI can offer:
+
+```bash
+decimo "sqrt(2)" -p 1000                     # default: BigDecimal
+decimo "sqrt(2)" -p 1000 --mode float         # use BigFloat (requires MPFR)
+decimo "sqrt(2)" -p 1000 --gmp                # BigDecimal with gmp=True
+```
+
+`--mode float` tells the evaluator to parse all literals as BigFloat and evaluate the
+entire expression in MPFR. `--gmp` keeps BigDecimal but enables `gmp=True` on each
+operation (current plan, deferred to Phase 4).
+
+### 9.11 Future: Global Variable or Context
+
+When Mojo gets global variables, a module-level default can supplement the per-call
+parameter:
+
+```mojo
+# Future (when Mojo supports global variables)
+BigDecimal.set_default_engine("gmp")
+var result = sqrt(x, 1000)            # automatically uses GMP
+```
+
+The `gmp` parameter stays as an explicit override for fine-grained control.
 
 ## 10. Build System Integration
 
 ### 10.1 C Wrapper Compilation
 
-The C wrapper must be compiled as a shared library and distributed alongside the Mojo package.
+The C wrapper gets compiled as a shared library, shipped alongside the Mojo package.
 
 **Build script** (`build_gmp_wrapper.sh`):
 
@@ -862,13 +904,12 @@ mojo build \
 
 ### 10.4 Runtime Detection: Single Binary Without Link-Time Dependencies
 
-**Goal**: A single compiled binary works whether or not MPFR/GMP is installed on the
-target machine. If `gmp=True` is requested and MPFR is found, use it. If not found,
-raise a helpful error. Users who never call `gmp=True` pay zero cost.
+**Goal**: One compiled binary that works whether or not MPFR/GMP is on the
+machine. If `gmp=True` is requested and MPFR is found, use it. If not, raise
+a helpful error. If nobody calls `gmp=True`, zero cost.
 
 **Verified**: `dlopen` works from Mojo 0.26.2 via `external_call["dlopen", ...]`.
-Tested successfully — GMP was loaded at runtime without any `-Xlinker` flags at
-compile time.
+I tested it — GMP loaded at runtime without any `-Xlinker` flags at compile time.
 
 **Architecture**: The C wrapper does lazy loading internally using `dlopen`/`dlsym`,
 so it compiles and links **without** needing MPFR/GMP headers or libraries present:
@@ -952,9 +993,9 @@ mojo build -Xlinker -L./path/to/wrapper -Xlinker -ldecimo_gmp_wrapper \
            -o myprogram myprogram.mojo
 ```
 
-The wrapper itself is a tiny C file (~200 lines) that can be compiled on any system
-with just a C compiler — no GMP/MPFR headers needed. MPFR is only loaded at runtime
-when `gmp=True` is actually called.
+The wrapper is a tiny C file (~200 lines) that compiles on any system
+with just a C compiler — no GMP/MPFR headers needed. MPFR only gets loaded at
+runtime when `gmp=True` is actually called.
 
 **Summary of approach by timeline**:
 
@@ -963,8 +1004,6 @@ when `gmp=True` is actually called.
 | **Now** (Mojo 0.26.2) | C wrapper + `dlopen`/`dlsym` | Link wrapper only; MPFR loaded lazily at runtime |
 | **Future** (DLHandle) | Mojo-native `DLHandle`       | Pure Mojo detection, no C wrapper needed         |
 
----
-
 ## 11. Cross-Platform Considerations
 
 ### 11.1 Platform Matrix
@@ -1014,8 +1053,6 @@ pkg-config --exists gmp && pkg-config --cflags --libs gmp
 - Different shared library format (`.dll` + import lib)
 - Mojo on Windows is still early — defer this concern
 
----
-
 ## 12. Risk Analysis
 
 ### 12.1 Technical Risks
@@ -1046,102 +1083,270 @@ pkg-config --exists gmp && pkg-config --cflags --libs gmp
 | Unnecessary conversions                  | Converting back and forth repeatedly | Cache GMP handles during multi-step computations |
 | GMP slower for specific operations       | Possible edge cases                  | Benchmark-driven thresholds per operation type   |
 
----
-
-## 13. Implementation Roadmap
-
-> Priority: **BigDecimal first**, using **MPFR** (not raw `mpz_t`). MPFR provides
-> built-in sqrt, exp, ln, sin, cos, tan, π with correct rounding — so most BigDecimal
-> operations are simple one-call wrappers rather than Newton/Taylor implementations.
-> BigInt integration is deferred to Phase 3.
-
-### Phase 1: Foundation — MPFR C Wrapper & GMPDecimal Struct
-
-**Goal**: Establish the MPFR-based C wrapper with runtime detection, the `GMPDecimal`
-intermediate struct, and prove end-to-end with `sqrt`.
-
-| Step | Task                                                | Detail                                                                                                                                                                                                                                                                                                           | Effort | Deps     |
-| ---- | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------ | -------- |
-| 1.1  | **Install MPFR**                                    | `brew install mpfr` (macOS). Verify: `/opt/homebrew/lib/libmpfr.dylib` exists, `#include <mpfr.h>` compiles. MPFR depends on GMP (already installed).                                                                                                                                                            | Tiny   | —        |
-| 1.2  | **Extend C wrapper with `mpfr_t` handle pool**      | Add to `gmp_wrapper.c`: (a) `static mpfr_t f_handles[MAX_HANDLES]` pool, (b) `mpfrw_init(prec_bits) → handle`, (c) `mpfrw_clear(h)`, (d) `mpfrw_set_str(h, str, len)` (length-safe, base-10), (e) `mpfrw_get_str(h, digits) → char*`, (f) `mpfrw_available()`. Keep existing `mpz_t` wrappers intact for BigInt. | Medium | 1.1      |
-| 1.3  | **Add MPFR arithmetic wrappers**                    | One-line C wrappers: `mpfrw_add(r,a,b)`, `mpfrw_sub(r,a,b)`, `mpfrw_mul(r,a,b)`, `mpfrw_div(r,a,b)`, `mpfrw_sqrt(r,a)`, `mpfrw_neg(r,a)`, `mpfrw_abs(r,a)`, `mpfrw_cmp(a,b)`. All use `MPFR_RNDN` (round-to-nearest). Each is 1 line: e.g. `mpfr_sqrt(f_handles[r], f_handles[a], MPFR_RNDN)`.                   | Small  | 1.2      |
-| 1.4  | **Add MPFR transcendental wrappers**                | `mpfrw_exp(r,a)`, `mpfrw_log(r,a)`, `mpfrw_sin(r,a)`, `mpfrw_cos(r,a)`, `mpfrw_tan(r,a)`, `mpfrw_const_pi(r)`, `mpfrw_pow(r,a,b)`, `mpfrw_rootn_ui(r,a,n)`. Each is 1 line.                                                                                                                                      | Small  | 1.2      |
-| 1.5  | **Implement lazy `dlopen` loading**                 | Wrap all MPFR calls behind function pointers resolved via `dlopen`/`dlsym` at first use (see Section 10.4). This makes the wrapper compilable without MPFR headers. Add `mpfrw_available() → 0/1`.                                                                                                               | Medium | 1.3, 1.4 |
-| 1.6  | **Compile wrapper, run smoke test**                 | Build `libdecimo_gmp_wrapper.dylib` linking against `-lmpfr -lgmp`. Write minimal Mojo test: `mpfrw_init(200) → set_str("3.14") → mpfrw_sqrt → get_str → print`. Verify output.                                                                                                                                  | Small  | 1.5      |
-| 1.7  | **Create `src/decimo/bigdecimal/gmp_decimal.mojo`** | Implement `GMPDecimal` struct (single `handle: Int32` field). Constructor from BigDecimal via `str(bd) → mpfrw_set_str`. `to_bigdecimal(precision) → mpfrw_get_str → BigDecimal(s)`. Destructor calls `mpfrw_clear`.                                                                                             | Medium | 1.6      |
-| 1.8  | **Wire `gmp=True` into `sqrt()`**                   | In `exponential.mojo`: add `gmp: Bool = False` parameter. If True: `GMPDecimal(x, prec+10) → mpfrw_sqrt → to_bigdecimal(prec)`. Three lines of new code.                                                                                                                                                         | Small  | 1.7      |
-| 1.9  | **Test sqrt correctness**                           | For x ∈ {2, 3, 0.5, 100, 10⁻⁸, 10¹⁰⁰⁰}, P ∈ {28, 100, 500, 1000, 5000}: verify `sqrt(x, P, gmp=True)` matches `sqrt(x, P)` to P digits.                                                                                                                                                                          | Medium | 1.8      |
-| 1.10 | **Benchmark sqrt**                                  | Measure wall time including conversion overhead. Find break-even precision where MPFR wins. Expected: above ~50–100 digits.                                                                                                                                                                                      | Small  | 1.9      |
-| 1.11 | **Update build system**                             | `pixi.toml` tasks: `build-gmp-wrapper` (compiles C wrapper + links MPFR). `build_gmp_wrapper.sh` for macOS + Linux.                                                                                                                                                                                              | Small  | 1.2      |
-
-**Deliverable**: `sqrt(BigDecimal("2"), precision=5000, gmp=True)` returns correct result.
-Internally: 1 string conversion in, 1 MPFR call, 1 string conversion out.
-
-### Phase 2: All BigDecimal Operations
-
-**Goal**: Extend `gmp=True` to all compute-heavy operations. Because MPFR has built-in
-functions for each, every operation is just a **thin wrapper** — no Newton/Taylor needed.
-
-| Step | Task                                      | Detail                                                                                                                                                       | Effort | Deps    |
-| ---- | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------ | ------- |
-| 2.1  | **Wire `true_divide(x, y, P, gmp=True)`** | `GMPDecimal(x) → GMPDecimal(y) → mpfrw_div → to_bigdecimal`.                                                                                                 | Small  | Phase 1 |
-| 2.2  | **Wire `root(x, n, P, gmp=True)`**        | `mpfrw_rootn_ui(r, x_handle, n)` — single MPFR call.                                                                                                         | Small  | Phase 1 |
-| 2.3  | **Wire `exp(x, P, gmp=True)`**            | `mpfrw_exp(r, x_handle)` — single MPFR call.                                                                                                                 | Small  | Phase 1 |
-| 2.4  | **Wire `ln(x, P, gmp=True)`**             | `mpfrw_log(r, x_handle)` — single MPFR call.                                                                                                                 | Small  | Phase 1 |
-| 2.5  | **Wire `sin(x, P, gmp)`, `cos`, `tan`**   | `mpfrw_sin`, `mpfrw_cos`, `mpfrw_tan` — each single call.                                                                                                    | Small  | Phase 1 |
-| 2.6  | **Wire `pi(P, gmp=True)`**                | `mpfrw_const_pi(r)` — single MPFR call. No Chudnovsky needed.                                                                                                | Small  | Phase 1 |
-| 2.7  | **Instance method wrappers**              | Update `self.sqrt()`, `self.exp()`, etc. to accept `gmp: Bool = False`.                                                                                      | Small  | 2.6     |
-| 2.8  | **Comprehensive test suite**              | For each function × {gmp=True, gmp=False}: verify agreement to requested precision. Edge cases: 0, negative, very large/small. Compare against WolframAlpha. | Large  | 2.7     |
-| 2.9  | **Benchmark all operations**              | Table: operation × precision (28, 100, 500, 1000, 5000) × gmp. Include conversion overhead.                                                                  | Medium | 2.8     |
-
-**Note**: Because each MPFR operation is a single C call, Phase 2 is significantly
-simpler than it would be with `mpz_t` (where we'd need to implement Taylor series,
-Chudnovsky, and Newton iteration ourselves in Mojo using GMPDecimal arithmetic).
-
-**Deliverable**: All iterative BigDecimal functions support `gmp=True`.
+## 13. Lessons from Other Libraries
+
+Before diving into the roadmap, I looked at how major open-source libraries integrate GMP/MPFR to see what I can learn from them.
+
+### 13.1 mpmath (Python) — Backend Abstraction
+
+mpmath is *the* arbitrary-precision math library for Python (~1M+ monthly downloads).
+
+**How mpmath does it**:
+
+- **Backend swap**: `backend.py` defines `MPZ` (integer type) and `BACKEND` (string).
+  At startup it tries `import gmpy2`, then `import gmp`, falling back to Python `int`.
+  All mantissa arithmetic goes through `MPZ` — a single point of substitution.
+- **Environment override**: `MPMATH_NOGMPY` env var disables GMP entirely,
+  useful for debugging and pure-Python deployment.
+- **gmpy acceleration hooks**: When gmpy is available, mpmath replaces its
+  `_normalize` and `from_man_exp` hot-path functions with optimized C versions
+  from gmpy (`gmpy._mpmath_normalize`, `gmpy._mpmath_create`), and also
+  `mpf_mul_int` is swapped to a gmpy-accelerated version. This gives 2–5×
+  speedup on core arithmetic with zero API change.
+- **Internal representation**: Binary floating-point tuples `(sign, man, exp, bc)`.
+  All precision is in **bits**, converted to/from decimal digits via
+  `dps_to_prec(n) = max(1, int(round((n+1)*3.3219...)))` and
+  `prec_to_dps(n) = max(1, int(round(n/3.3219...))-1)`.
+- **Guard bits**: Extra precision is added internally. Individual functions use
+  `extraprec` (add N bits) or `workprec` (set to N bits). The `autoprec` decorator
+  doubles precision until results converge — a "guess-and-verify" paradigm.
+
+**Takeaway**: The single-variable backend swap (`MPZ = gmpy.mpz`) is clean.
+My C wrapper approach is analogous but at a lower level. I should make sure
+that all GMP-dependent code stays isolated in a single module.
+
+### 13.2 gmpy2 (Python ↔ MPFR) — Context-Based Precision
+
+gmpy2 is the standard Python binding for GMP + MPFR + MPC.
+
+**How gmpy2 does it**:
+
+- **Context object**: `gmpy2.context()` controls precision, rounding mode, exception
+  traps, and exponent range. `get_context()` returns the active context;
+  `local_context()` provides temporary overrides (similar to Python's `decimal.localcontext()`).
+- **5 rounding modes**: RoundToNearest, RoundToZero, RoundUp, RoundDown, RoundAwayZero.
+  These map directly to MPFR's rounding modes.
+- **Thread safety**: MPFR data (flags, emin/emax, default precision) uses
+  thread-local storage when built with `--enable-thread-safe`.
+- **Exception model**: Flags for overflow, underflow, inexact, invalid, divzero,
+  erange. Each has a corresponding `trap_*` bool — if True, raises a Python
+  exception; if False, sets the flag silently and returns a special value (NaN, ±Inf).
+- **Precision tracking**: Each `mpfr` value carries its own precision. When combining
+  values of different precisions, the context precision governs the result.
+- **Implicit conversion**: `mpz → mpfr` is automatic. `float → mpfr` preserves
+  all 53 bits. `str → mpfr` uses full specified precision.
+
+**Takeaway**: My `gmp: Bool = False` parameter approach is simpler but
+sufficient. If I later want a richer API, a context object could control precision,
+rounding mode, and GMP-enable globally. For now, per-call `gmp=True` is fine.
+
+### 13.3 Arb/FLINT — Ball Arithmetic & Guard Bits
+
+Arb (now merged into FLINT) implements rigorous arbitrary-precision ball arithmetic.
+
+**How Arb works**:
+
+- **Ball semantics**: Each number is `[midpoint ± radius]`. Operations propagate
+  error bounds automatically. The user doesn't manually track rounding errors.
+- **Binary ↔ decimal conversion**: Arb explicitly warns that this conversion is
+  lossy. `arb_set_str("2.3", prec)` correctly creates a ball containing 23/10,
+  unlike a naive `double` assignment. `arb_printn()` only prints digits that are
+  guaranteed correct.
+- **Guess-and-verify paradigm**: Start with modest precision, check if the result
+  is accurate enough, double precision and retry if not. This is the recommended
+  approach for complex computations.
+- **Guard bits heuristic**: For accuracy of p bits, use working precision O(1)+p.
+  The O(1) term depends on the expression. "Adding a few guard bits" (10–20 for
+  basic operations) is usually enough.
+- **Polynomial time guarantee**: Arb caps internal work parameters by polynomial
+  functions of `prec` to prevent runaway computation. E.g., `arb_sin(huge_x)` returns
+  `[±1]` instead of attempting expensive argument reduction.
+- **Handle lifecycle**: GMP-style `arb_init(x)` / `arb_clear(x)`. Output parameters
+  come first, then inputs, then precision.
+
+**Takeaway**: The guard bits pattern confirms my approach of using
+`prec + 10` in MPFR calls. I should also add a precision cap for safety.
+The polynomial time guarantee is worth stealing for my iterative operations.
+
+### 13.4 python-flint — Direct FLINT/Arb Bindings
+
+python-flint wraps FLINT (which includes Arb) for Python.
+
+**How python-flint works**:
+
+- **Type hierarchy**: Separate types for each number ring: `fmpz` (integers),
+  `fmpq` (rationals), `arb` (real balls), `acb` (complex balls), plus polynomial
+  and matrix variants.
+- **Global context**: Precision is set globally (`ctx.prec = 333` for ~100 decimal
+  digits). All operations use the current context precision.
+
+### 13.5 Consistency Check: My Plan vs. Best Practices
+
+| Best Practice (from survey)                | Status in My Plan   | Notes                                                             |
+| ------------------------------------------ | ------------------- | ----------------------------------------------------------------- |
+| Graceful fallback (GMP optional)           | ✓ Designed          | BigDecimal works without MPFR; BigFloat requires it               |
+| First-class user-facing type               | ✓ BigFloat          | Like mpmath's `mpf` — user constructs and chains operations       |
+| Convenience sugar on existing type         | ✓ `gmp=True`        | BigDecimal ops accept `gmp=True` for one-off acceleration         |
+| Isolated backend module                    | ✓ `bigfloat.mojo`   | All MPFR calls in one file                                        |
+| Guard bits in MPFR calls                   | ✓ `prec + 10`       | Confirmed adequate by Arb docs                                    |
+| Decimal ↔ binary via string (not raw bits) | ✓ Planned           | `mpfr_set_str` / `mpfr_get_str`, base 10                          |
+| Environment variable to disable GMP        | ✗ **Not yet**       | **TODO**: `DECIMO_NOGMP` env check (like mpmath `MPMATH_NOGMPY`)  |
+| Handle pool with lifecycle management      | ✓ Planned           | C wrapper handle pool                                             |
+| Rounding mode control                      | △ Partial           | RNDN only. Consider exposing rounding mode later                  |
+| Precision cap for safety                   | ✗ **Not yet**       | **TODO**: Cap internal MPFR precision to prevent OOM              |
+| Conversion overhead awareness              | ✓ Documented        | String round-trip mitigated by BigFloat chaining + lazy future    |
+| Thread safety considerations               | ✗ **Not yet**       | **TODO**: MPFR handle pool is not thread-safe; document this      |
+| Benchmark-driven thresholds                | ✓ Planned (Phase 1) | Find break-even precision for each operation                      |
+| Lazy evaluation for chained ops            | △ Future            | `BigDecimal.lazy()...collect()` — deferred, architecturally clean |
+
+**Gaps I found** (incorporated into the roadmap below):
+
+1. **`DECIMO_NOGMP` environment variable** (Phase 1) — lets users disable GMP at
+   runtime, like mpmath's `MPMATH_NOGMPY`.
+2. **Precision cap** (Phase 1) — prevent MPFR from allocating unbounded memory
+   for extreme precision values.
+3. **Thread safety docs** (Phase 4) — the C handle pool is not thread-safe;
+   need to document this.
+
+## 14. Implementation Roadmap
+
+> BigFloat first (Phase 1), then BigDecimal `gmp=True` sugar (Phase 2), then BigInt
+> backend (Phase 3). MPFR has built-in sqrt, exp, ln, sin, cos, tan, π with correct
+> rounding — so BigFloat operations are single MPFR calls, and BigDecimal sugar is
+> just BigFloat under the hood.
+
+### Phase 1: Foundation — MPFR C Wrapper & BigFloat
+
+**Goal**: Get the MPFR-based C wrapper working, build the `BigFloat` struct as a
+first-class user-facing type, and prove the pipeline with `sqrt`.
+
+- [ ] **1.1 Install MPFR** (Tiny, no deps)
+  `brew install mpfr` (macOS). Verify: `/opt/homebrew/lib/libmpfr.dylib` exists,
+  `#include <mpfr.h>` compiles. MPFR depends on GMP (already installed).
+- [ ] **1.2 Extend C wrapper with `mpfr_t` handle pool** (Medium, deps: 1.1)
+  Add to `gmp_wrapper.c`:
+  (a) `static mpfr_t f_handles[MAX_HANDLES]` pool,
+  (b) `mpfrw_init(prec_bits) → handle`,
+  (c) `mpfrw_clear(h)`,
+  (d) `mpfrw_set_str(h, str, len)` (length-safe, base-10),
+  (e) `mpfrw_get_str(h, digits) → char*`,
+  (f) `mpfrw_available()`.
+  Keep existing `mpz_t` wrappers intact for BigInt.
+  **Add precision cap**: reject `prec_bits > MAX_PREC` (e.g. 1M bits ≈ 300K digits)
+  to prevent OOM, inspired by Arb's polynomial time guarantee.
+- [ ] **1.3 Add MPFR arithmetic wrappers** (Small, deps: 1.2)
+  One-line C wrappers: `mpfrw_add(r,a,b)`, `mpfrw_sub(r,a,b)`, `mpfrw_mul(r,a,b)`,
+  `mpfrw_div(r,a,b)`, `mpfrw_sqrt(r,a)`, `mpfrw_neg(r,a)`, `mpfrw_abs(r,a)`,
+  `mpfrw_cmp(a,b)`. All use `MPFR_RNDN` (round-to-nearest). Each is 1 line.
+- [ ] **1.4 Add MPFR transcendental wrappers** (Small, deps: 1.2)
+  `mpfrw_exp(r,a)`, `mpfrw_log(r,a)`, `mpfrw_sin(r,a)`, `mpfrw_cos(r,a)`,
+  `mpfrw_tan(r,a)`, `mpfrw_const_pi(r)`, `mpfrw_pow(r,a,b)`, `mpfrw_rootn_ui(r,a,n)`.
+  Each is 1 line.
+- [ ] **1.5 Implement lazy `dlopen` loading** (Medium, deps: 1.3, 1.4)
+  Wrap all MPFR calls behind function pointers resolved via `dlopen`/`dlsym` at
+  first use (see Section 10.4). This makes the wrapper compilable without MPFR
+  headers. Add `mpfrw_available() → 0/1`.
+  **Add `DECIMO_NOGMP` check**: if set, `mpfrw_available()` returns 0 without
+  attempting `dlopen` (inspired by mpmath's `MPMATH_NOGMPY`).
+- [ ] **1.6 Compile wrapper, run smoke test** (Small, deps: 1.5)
+  Build `libdecimo_gmp_wrapper.dylib` linking against `-lmpfr -lgmp`. Write minimal
+  Mojo test: `mpfrw_init(200) → set_str("3.14") → mpfrw_sqrt → get_str → print`.
+  Verify output.
+- [ ] **1.7 Create `src/decimo/bigfloat/bigfloat.mojo`** (Medium, deps: 1.6)
+  Implement `BigFloat` struct (single `handle: Int32` field). Constructor from
+  string via `mpfrw_set_str`. Constructor from BigDecimal via `str(bd) → mpfrw_set_str`.
+  `to_bigdecimal(precision) → mpfrw_get_str → BigDecimal(s)`. Destructor calls
+  `mpfrw_clear`. Use guard bits: init MPFR with `dps_to_prec(precision) + 20` bits.
+- [ ] **1.8 BigFloat arithmetic and transcendentals** (Medium, deps: 1.7)
+  Wire all MPFR wrappers into BigFloat methods: `sqrt()`, `exp()`, `ln()`, `sin()`,
+  `cos()`, `tan()`, `root(n)`, `power(y)`, `+`, `-`, `*`, `/`, `pi()`.
+  Each is a thin wrapper around the corresponding `mpfrw_*` call.
+- [ ] **1.9 BigFloat ↔ BigDecimal conversion** (Small, deps: 1.7)
+  `BigFloat(bd: BigDecimal, precision)` and `bf.to_bigdecimal(precision)`.
+  Both go through string representation (`mpfr_set_str` / `mpfr_get_str`).
+- [ ] **1.10 Test BigFloat correctness** (Medium, deps: 1.8)
+  For each operation × x ∈ {2, 3, 0.5, 100, 10⁻⁸, 10¹⁰⁰⁰} × P ∈ {28, 100, 500,
+  1000, 5000}: verify BigFloat results match BigDecimal results to P digits.
+  Edge cases: 0, negative, very large/small.
+- [ ] **1.11 Benchmark BigFloat** (Small, deps: 1.10)
+  Measure wall time for BigFloat vs BigDecimal across operations and precisions.
+  Find break-even precision where BigFloat wins. Expected: above ~50–100 digits.
+- [ ] **1.12 Update build system** (Small, deps: 1.2)
+  `pixi.toml` tasks: `build-gmp-wrapper` (compiles C wrapper + links MPFR).
+  `build_gmp_wrapper.sh` for macOS + Linux.
+
+**Deliverable**: `BigFloat("2", 5000).sqrt()` works. Users can construct BigFloat,
+chain operations, and convert back to BigDecimal.
+
+### Phase 2: BigDecimal `gmp=True` Convenience Sugar
+
+**Goal**: Wire `gmp=True` into BigDecimal compute-heavy operations. Internally each
+uses BigFloat (Phase 1), so this is just plumbing.
+
+- [ ] **2.1 Wire `sqrt(x, P, gmp=True)`** (Small, deps: Phase 1)
+  If `gmp=True`: `BigFloat(x, P+10).sqrt().to_bigdecimal(P)`. Three lines.
+- [ ] **2.2 Wire `true_divide(x, y, P, gmp=True)`** (Small, deps: Phase 1)
+  `BigFloat(x, P+10) / BigFloat(y, P+10) → to_bigdecimal(P)`.
+- [ ] **2.3 Wire `root`, `exp`, `ln`, `sin`, `cos`, `tan`** (Small, deps: Phase 1)
+  Each is a one-liner delegating to BigFloat. Single MPFR call per operation.
+- [ ] **2.4 Wire `pi(P, gmp=True)`** (Small, deps: Phase 1)
+  `BigFloat.pi(P+10).to_bigdecimal(P)`. No Chudnovsky needed.
+- [ ] **2.5 Instance method wrappers** (Small, deps: 2.4)
+  Update `self.sqrt()`, `self.exp()`, etc. to accept `gmp: Bool = False`.
+- [ ] **2.6 Comprehensive test suite** (Large, deps: 2.5)
+  For each function × {gmp=True, gmp=False}: verify agreement to requested
+  precision. Edge cases: 0, negative, very large/small.
+- [ ] **2.7 Benchmark gmp=True vs gmp=False** (Medium, deps: 2.6)
+  Table: operation × precision (28, 100, 500, 1000, 5000) × gmp. Include
+  conversion overhead.
+
+**Deliverable**: All iterative BigDecimal functions support `gmp=True`. Existing
+BigDecimal API unchanged for `gmp=False`.
 
 **Checkpoint test**: `pi(precision=10000, gmp=True)` matches known π digits and runs
 significantly faster than `pi(precision=10000)`.
 
 ### Phase 3: BigInt GMP Backend (Deferred)
 
-**Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred because integer
-operations are less user-facing.
+**Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred — integer
+ops are less user-facing.
 
-| Step | Task                                            | Detail                                                                 | Effort | Deps    |
-| ---- | ----------------------------------------------- | ---------------------------------------------------------------------- | ------ | ------- |
-| 3.1  | **Word import/export in C wrapper**             | Complete `gmpw_import_u32_le` / `gmpw_export_u32_le`. Test round-trip. | Small  | Phase 1 |
-| 3.2  | **Create `src/decimo/bigint/gmp_backend.mojo`** | GMPBigInt struct. Convert BigInt ↔ GMP via O(n) word import/export.    | Medium | 3.1     |
-| 3.3  | **Add `gmp: Bool = False` to BigInt ops**       | On `multiply`, `floor_divide`, `sqrt`, `gcd`, `power` free functions.  | Medium | 3.2     |
-| 3.4  | **Tests and benchmarks**                        | Verify correctness. Confirm 7–18× speedup.                             | Medium | 3.3     |
+- [ ] **3.1 Word import/export in C wrapper** (Small, deps: Phase 1)
+  Complete `gmpw_import_u32_le` / `gmpw_export_u32_le`. Test round-trip.
+- [ ] **3.2 Create `src/decimo/bigint/gmp_backend.mojo`** (Medium, deps: 3.1)
+  GMPBigInt struct. Convert BigInt ↔ GMP via O(n) word import/export.
+- [ ] **3.3 Add `gmp: Bool = False` to BigInt ops** (Medium, deps: 3.2)
+  On `multiply`, `floor_divide`, `sqrt`, `gcd`, `power` free functions.
+- [ ] **3.4 Tests and benchmarks** (Medium, deps: 3.3)
+  Verify correctness. Confirm 7–18× speedup.
 
 ### Phase 4: Polish, Distribution & Future
 
-| Step | Task                                                                     | Effort | Deps             |
-| ---- | ------------------------------------------------------------------------ | ------ | ---------------- |
-| 4.1  | **Cross-platform wrapper compilation (Linux)**                           | Small  | Phase 1          |
-| 4.2  | **Pre-built wrapper binaries (macOS ARM64, Linux x86)**                  | Medium | 4.1              |
-| 4.3  | **CLI `--gmp` flag**                                                     | Medium | Phase 2          |
-| 4.4  | **GMPDecimal expression chaining** (user-facing)                         | Medium | Phase 2          |
-|      | ↳ `gx = GMPDecimal(x); r = gx.sqrt().multiply(gx.exp()).to_bigdecimal()` |        |                  |
-| 4.5  | **Documentation: user guide for GMP/MPFR integration**                   | Small  | Phase 2          |
-| 4.6  | **Pure-Mojo runtime detection (when DLHandle available)**                | Medium | Mojo improvement |
-| 4.7  | **GMP rational numbers (`mpq_t`) for exact division**                    | Medium | Future           |
+- [ ] **4.1 Cross-platform wrapper compilation (Linux)** (Small, deps: Phase 1)
+- [ ] **4.2 Pre-built wrapper binaries (macOS ARM64, Linux x86)** (Medium, deps: 4.1)
+- [ ] **4.3 CLI `--mode float` and `--gmp` flags** (Medium, deps: Phase 2)
+  `--mode float` evaluates in BigFloat; `--gmp` enables `gmp=True` on BigDecimal ops.
+- [ ] **4.4 Documentation: user guide for BigFloat and GMP integration** (Small, deps: Phase 2)
+- [ ] **4.5 Pure-Mojo runtime detection (when DLHandle available)** (Medium, deps: Mojo improvement)
+- [ ] **4.6 GMP rational numbers (`mpq_t`) for exact division** (Medium, future)
+- [ ] **4.7 Document thread safety** (Small, deps: Phase 1)
+  The C handle pool is not thread-safe; MPFR requires `--enable-thread-safe` for
+  concurrent use. Document this limitation.
+- [ ] **4.8 Lazy evaluation: `BigDecimal.lazy()...collect()`** (Large, future)
+  Record a DAG of operations, execute entire chain in MPFR with zero intermediate
+  conversions. See Section 9.9 for design. Architecturally clean because BigFloat
+  and the MPFR wrapper (Phase 1) already provide the backend.
 
 ### Estimated Effort Summary
 
-| Phase                       | Scope                                          | Steps | Complexity                                    | Priority |
-| --------------------------- | ---------------------------------------------- | ----- | --------------------------------------------- | -------- |
-| Phase 1: Foundation + sqrt  | MPFR wrapper, GMPDecimal, runtime detect, sqrt | 11    | Medium — most work is C wrapper + plumbing    | **HIGH** |
-| Phase 2: All BigDecimal ops | divide, exp, ln, trig, pi, root                | 9     | **Low** — each op is a thin wrapper over MPFR | **HIGH** |
-| Phase 3: BigInt backend     | Word import/export, mpz_t dispatch             | 4     | Medium                                        | Deferred |
-| Phase 4: Polish             | CLI, cross-platform, docs, chaining            | 7     | Varies                                        | Low      |
+- **Phase 1: MPFR wrapper + BigFloat** — 12 steps, Medium complexity (most work is C wrapper + BigFloat struct). **HIGH** priority.
+- **Phase 2: BigDecimal gmp=True sugar** — 7 steps, **Low** complexity (each op delegates to BigFloat). **HIGH** priority.
+- **Phase 3: BigInt backend** — 4 steps, Medium complexity. Deferred.
+- **Phase 4: Polish + lazy eval** — 8 steps, varies. Low priority.
 
----
+## 15. Appendix: Prototype Results
 
-## 14. Appendix: Prototype Results
-
-### 14.1 Files Created
+### 15.1 Files Created
 
 | File                                | Purpose                   | Status                                 |
 | ----------------------------------- | ------------------------- | -------------------------------------- |
@@ -1151,7 +1356,7 @@ operations are less user-facing.
 | `temp/gmp/bench_gmp_vs_decimo.mojo` | GMP benchmark suite       | ✓ Complete                             |
 | `temp/gmp/build_and_test.sh`        | Build and test automation | ✓ Complete                             |
 
-### 14.2 C Wrapper API Summary
+### 15.2 C Wrapper API Summary
 
 ```c
 // Lifecycle
@@ -1198,12 +1403,12 @@ int         gmpw_available(void);
 const char* gmpw_version(void);
 ```
 
-### 14.3 Sample Mojo FFI Calls
+### 15.3 Sample Mojo FFI Calls
 
 ```mojo
 from sys.ffi import external_call, c_int
 
-# ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 15.1)
+# ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 16.1)
 def gmp_set_str(h: c_int, s: String):
     var slen = c_int(len(s))
     var ptr = Int(s.unsafe_ptr())
@@ -1223,7 +1428,7 @@ def gmp_add(r: c_int, a: c_int, b: c_int):
     external_call["gmpw_add", NoneType, c_int, c_int, c_int](r, a, b)
 ```
 
-### 14.4 Benchmark Data Table
+### 15.4 Benchmark Data Table
 
 **GMP absolute timings** (per operation, macOS ARM64, Apple M-series):
 
@@ -1235,18 +1440,15 @@ def gmp_add(r: c_int, a: c_int, b: c_int):
 | 10,000  | 50 µs    | 226 ns | 30 µs  | 21 µs  | 2 µs   |
 | 100,000 | 939 µs   | 2 µs   | 740 µs | 642 µs | 324 µs |
 
----
-
-## 15. Lessons Learned from Prototype Development
+## 16. Lessons Learned from Prototype Development
 
-This section documents issues encountered during the GMP prototype build-test cycle
-and their fixes. **Read this before starting implementation** — every item below
-cost significant debugging time and will recur if not handled.
+This section documents gotchas from the GMP prototype build-test cycle
+and how I fixed them. **Read this before starting implementation** — every item
+cost me real debugging time and will bite again if not handled.
 
-### 15.1 CRITICAL: Mojo String Use-After-Free in FFI Calls
+### 16.1 CRITICAL: Mojo String Use-After-Free in FFI
 
-**Symptom**: GMP's `mpz_set_str()` silently fails (returns -1), leaving the mpz value
-as 0. Subsequent operations on zero values trigger `__gmp_overflow_in_mpz` in division
+**Symptom**: GMP's `mpz_set_str()` silently fails (returns -1), leaving the mpz as 0. Subsequent operations on zero values trigger `__gmp_overflow_in_mpz` in division
 or crash with corrupt data.
 
 **Root cause**: When you write:
@@ -1256,15 +1458,15 @@ def gmp_set_str(h: c_int, s: String):
     _ = external_call["gmpw_set_str", c_int, c_int, Int](h, Int(s.unsafe_ptr()))
 ```
 
-Mojo's optimizer may drop the `String` parameter `s` (freeing its backing buffer) before
-`external_call` actually reads the pointer. Under heavy allocation pressure (millions of
-BigInt/BigDecimal operations), the freed buffer is immediately reused by the allocator,
-overrwriting the null terminator. GMP then reads past the intended string into adjacent
-memory, encounters non-digit bytes, and returns error code -1.
+Mojo's optimizer may drop the `String` parameter `s` (freeing its buffer) before
+`external_call` reads the pointer. Under heavy allocation pressure (millions of
+BigInt/BigDecimal ops), the freed buffer gets reused immediately,
+overwriting the null terminator. GMP reads past the string into adjacent
+memory, hits non-digit bytes, returns -1.
 
-**How we found it**: The crash only appeared in the full benchmark binary after
-millions of prior allocations, never in small test files. Adding `fprintf(stderr)`
-logging to the C wrapper revealed that:
+**How I found it**: The crash only showed up in the full benchmark binary after
+millions of prior allocations — never in small test files. Adding `fprintf(stderr)`
+logging to the C wrapper revealed:
 
 - `strlen()` on the pointer showed 109 instead of expected 100
 - `mpz_set_str` returned -1 (failure), not 0 (success)
@@ -1292,17 +1494,17 @@ def gmp_set_str(h: c_int, s: String):
     _ = external_call["gmpw_set_str_n", c_int, c_int, Int, c_int](h, ptr, slen)
 ```
 
-**Rule**: **ALWAYS use `gmpw_set_str_n` with explicit length for any String → C FFI
-call.** Never rely on null-termination of Mojo `String.unsafe_ptr()` in `external_call`.
+**Rule**: **Always use `gmpw_set_str_n` with explicit length for any String → C FFI
+call.** Don't rely on null-termination of Mojo `String.unsafe_ptr()` in `external_call`.
 
-**Broader implication**: This affects ALL FFI calls that pass Mojo `String` pointers.
-Every such call in the codebase must either:
+**Broader implication**: This affects ALL FFI calls passing Mojo `String` pointers.
+Every such call must either:
 
 1. Use the length-aware C variant (recommended), or
 2. Store the String in a local variable AND ensure no other allocation occurs between
    `unsafe_ptr()` extraction and the `external_call` (fragile, not recommended)
 
-### 15.2 `mojo run` Does Not Support `-Xlinker` Flags
+### 16.2 `mojo run` Does Not Support `-Xlinker` Flags
 
 **Symptom**: `mojo run file.mojo -Xlinker -lgmp_wrapper` ignores the linker flags,
 causing "undefined symbol" errors at runtime.
@@ -1322,7 +1524,7 @@ pixi run mojo build -Xlinker -L... -Xlinker -lgmp_wrapper -o program program.moj
 pixi run mojo run -Xlinker -L... program.mojo
 ```
 
-### 15.3 Mojo 0.26.2 Syntax Differences
+### 16.3 Mojo 0.26.2 Syntax Differences
 
 Several Mojo syntax changes from earlier versions caused compile errors:
 
@@ -1333,29 +1535,28 @@ Several Mojo syntax changes from earlier versions caused compile errors:
 | UInt → Float64     | `Float64(some_uint)`               | `Float64(Int(some_uint))`                      |
 | UnsafePointer      | `UnsafePointer[Byte](address=ptr)` | `UnsafePointer[Byte](unsafe_from_address=ptr)` |
 
-### 15.4 GMP `__gmp_overflow_in_mpz` Means Bad Input, Not Overflow
+### 16.4 GMP `__gmp_overflow_in_mpz` Means Bad Input, Not Overflow
 
-Despite the name, `__gmp_overflow_in_mpz` typically means the `mpz_t` struct has
-corrupted metadata (alloc=0, size=0, or invalid limb pointer), NOT that a computation
-result was too large. If you see this error:
+Despite the name, `__gmp_overflow_in_mpz` usually means the `mpz_t` struct has
+corrupted metadata (alloc=0, size=0, or bad limb pointer), NOT that a result was too large. If you see this:
 
 1. **First check**: Did `mpz_set_str` fail? (Return value -1 means the string was invalid.)
 2. **Second check**: Was the handle cleared and reused?
 3. **Third check**: Was the handle ever initialized?
 
-In our case it was always (1) — the string pointer was dangling.
+In my case it was always (1) — the string pointer was dangling.
 
-### 15.5 Handle Pool Sizing
+### 16.5 Handle Pool Sizing
 
-The C wrapper uses a static array of `MAX_HANDLES = 4096` handles. For Phase 1 (sqrt),
+The C wrapper uses a static array of `MAX_HANDLES = 4096`. For Phase 1 (sqrt),
 each Newton iteration typically uses 3–5 temporary handles, so 4096 is plenty. But for
 Phase 2 (complex expression trees), monitor handle usage. Consider:
 
 - Adding `gmpw_handles_in_use()` diagnostic function
 - Expanding to dynamic allocation if 4096 is insufficient
-- Ensuring every code path calls `gmpw_clear()` (RAII via `GMPDecimal.__del__`)
+- Ensuring every code path calls `gmpw_clear()` (RAII via `BigFloat.__del__`)
 
-### 15.6 Build Command Template
+### 16.6 Build Command Template
 
 The full build command for any GMP-linked Mojo file on macOS:
 
@@ -1377,7 +1578,7 @@ Key points:
 - The binary must be run from the directory containing `libgmp_wrapper.dylib`
   (or use absolute rpath)
 
-### 15.7 Debugging Tip: Add `fprintf(stderr)` to C Wrapper
+### 16.7 Debugging Tip: Add `fprintf(stderr)` to C Wrapper
 
 When debugging FFI issues, temporarily add `fprintf(stderr)` logging to the C wrapper
 functions. This is invaluable because Mojo's error reporting for FFI issues is limited.
@@ -1386,26 +1587,24 @@ The C wrapper can inspect raw pointer values, string contents, and GMP internal
 
 Always remove debug logging before benchmarking — `fprintf` has measurable overhead.
 
----
-
 ## Summary
 
-GMP/MPFR integration is **technically proven and recommended** for Decimo, with BigDecimal
-as the priority target using **MPFR** (not raw `mpz_t`):
-
-1. **MPFR for BigDecimal** — Uses `mpfr_t` (arbitrary-precision binary float) instead of
-   `mpz_t` + manual scale tracking. Every operation (sqrt, exp, ln, sin, cos, tan, π,
-   divide) is a **single MPFR call**. No Newton iteration or Taylor series implementation
-   needed from the Mojo side. The `GMPDecimal` struct has just one field (`handle`).
-2. **`gmp: Bool = False`** — Simple opt-in parameter on compute-heavy functions. Defaults
-   to native Decimo for backward compatibility. Guard digits handle binary↔decimal rounding.
-3. **Single binary, runtime detection** — The C wrapper uses `dlopen`/`dlsym` to load
-   MPFR lazily at runtime (verified working in Mojo 0.26.2). Users who never call
-   `gmp=True` pay zero cost. If MPFR isn't installed, a clear error message is raised.
-4. **BigInt deferred** to Phase 3 (uses `mpz_t` with word import/export).
-5. **CLI `--gmp` deferred** to Phase 4.
-6. **Phase 2 is trivial** — Because MPFR has built-in functions, wiring each BigDecimal
-   operation is a thin wrapper (~5 lines each), not a reimplementation of algorithms.
-
-The prototype in `temp/gmp/` and the lessons in Section 15 provide a foundation for
-implementation. Start with Phase 1, Step 1.1 (`brew install mpfr`).
+GMP/MPFR integration is **proven and worth doing** for Decimo. Two-type architecture:
+
+1. **BigFloat** — new first-class MPFR-backed binary float type. Like mpmath's `mpf`
+   but faster. Every operation (sqrt, exp, ln, sin, cos, tan, π, divide) is a **single
+   MPFR call**. Struct is one field (`handle: Int32`). Requires MPFR — no fallback,
+   no apologies. Users without MPFR use BigDecimal.
+2. **BigDecimal `gmp=True`** — convenience sugar for one-off acceleration. Internally
+   delegates to BigFloat. Backward-compatible: defaults to `gmp=False`, existing code
+   unchanged. Guard digits handle binary↔decimal rounding.
+3. **Two types, two use cases** — BigFloat for fast scientific computing (binary
+   semantics). BigDecimal for exact decimal arithmetic (finance, human-friendly output).
+   Both in one library.
+4. **Single binary, runtime detection** — C wrapper uses `dlopen`/`dlsym` to load
+   MPFR lazily (verified on Mojo 0.26.2). If MPFR isn't installed, BigDecimal works
+   fine without it; BigFloat raises a clear error.
+5. **BigInt** deferred to Phase 3 (`mpz_t` with word import/export, ~10× speedup).
+6. **Future: lazy evaluation** — `BigDecimal.lazy()...collect()` records an operation
+   DAG and executes the whole chain in MPFR with zero intermediate conversions.
+   Deferred to Phase 4, but architecturally clean because BigFloat provides the backend.

From 30454666f9a34855fb7aa152ae065cfae689df03 Mon Sep 17 00:00:00 2001
From: ZHU Yuhao <dr.yuhao.zhu@outlook.com>
Date: Fri, 3 Apr 2026 22:41:04 +0200
Subject: [PATCH 4/6] Update planning

---
 docs/plans/gmp_integration.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md
index d84b951f..7ce47e9a 100644
--- a/docs/plans/gmp_integration.md
+++ b/docs/plans/gmp_integration.md
@@ -1211,7 +1211,7 @@ python-flint wraps FLINT (which includes Arb) for Python.
 ## 14. Implementation Roadmap
 
 > BigFloat first (Phase 1), then BigDecimal `gmp=True` sugar (Phase 2), then BigInt
-> backend (Phase 3). MPFR has built-in sqrt, exp, ln, sin, cos, tan, π with correct
+> backend (Phase 3 or never). MPFR has built-in sqrt, exp, ln, sin, cos, tan, π with correct
 > rounding — so BigFloat operations are single MPFR calls, and BigDecimal sugar is
 > just BigFloat under the hood.
 
@@ -1306,7 +1306,7 @@ BigDecimal API unchanged for `gmp=False`.
 **Checkpoint test**: `pi(precision=10000, gmp=True)` matches known π digits and runs
 significantly faster than `pi(precision=10000)`.
 
-### Phase 3: BigInt GMP Backend (Deferred)
+### Phase 3: BigInt GMP Backend (Deferred, maybe never)
 
 **Goal**: GMP backend for BigInt (base-2³²) using `mpz_t`. Deferred — integer
 ops are less user-facing.

From 6801226ea2e455e4423e369d5d3bc791d2a355a3 Mon Sep 17 00:00:00 2001
From: ZHU Yuhao <dr.yuhao.zhu@outlook.com>
Date: Sat, 4 Apr 2026 00:25:43 +0200
Subject: [PATCH 5/6] Initialize files

---
 .gitignore                            |   3 +
 NOTICE                                |  26 ++
 docs/readme_unreleased.md             |  22 +-
 pixi.toml                             |   4 +
 src/decimo/bigfloat/__init__.mojo     |  30 +++
 src/decimo/bigfloat/bigfloat.mojo     |  98 +++++++
 src/decimo/bigfloat/mpfr_wrapper.mojo | 198 ++++++++++++++
 src/decimo/gmp/README.md              |  43 +++
 src/decimo/gmp/build_gmp_wrapper.sh   |  41 +++
 src/decimo/gmp/gmp_wrapper.c          | 374 ++++++++++++++++++++++++++
 10 files changed, 823 insertions(+), 16 deletions(-)
 create mode 100644 NOTICE
 create mode 100644 src/decimo/bigfloat/__init__.mojo
 create mode 100644 src/decimo/bigfloat/bigfloat.mojo
 create mode 100644 src/decimo/bigfloat/mpfr_wrapper.mojo
 create mode 100644 src/decimo/gmp/README.md
 create mode 100755 src/decimo/gmp/build_gmp_wrapper.sh
 create mode 100644 src/decimo/gmp/gmp_wrapper.c

diff --git a/.gitignore b/.gitignore
index a05a2e40..3b4a4852 100644
--- a/.gitignore
+++ b/.gitignore
@@ -28,6 +28,9 @@ kgen.trace.json*
 local
 # CLI binary
 /decimo
+# Compiled GMP/MPFR wrapper (built from source)
+*.dylib
+*.so
 # Python build artifacts
 *.so
 __pycache__/
diff --git a/NOTICE b/NOTICE
new file mode 100644
index 00000000..fdcee941
--- /dev/null
+++ b/NOTICE
@@ -0,0 +1,26 @@
+Decimo
+Copyright 2025 Yuhao Zhu
+
+Licensed under the Apache License, Version 2.0.
+
+
+Third-Party Dependencies
+========================
+
+Decimo optionally uses the following libraries at runtime, if installed by
+the user. Decimo does not include, copy, or distribute any source code or
+binaries from these projects.
+
+GNU MPFR Library (https://www.mpfr.org/)
+  Copyright (C) Free Software Foundation, Inc.
+  Licensed under the GNU Lesser General Public License v3.0 or later (LGPLv3+).
+
+GNU Multiple Precision Arithmetic Library (GMP) (https://gmplib.org/)
+  Copyright (C) Free Software Foundation, Inc.
+  Licensed under the GNU Lesser General Public License v3.0 or later (LGPLv3+),
+  or the GNU General Public License v2.0 or later (GPLv2+).
+
+MPFR and GMP are loaded at runtime via dlopen/dlsym when the user has
+independently installed them (e.g. `brew install mpfr` on macOS or
+`apt install libmpfr-dev` on Linux). The BigFloat type requires MPFR;
+all other Decimo types work without any external dependencies.
diff --git a/docs/readme_unreleased.md b/docs/readme_unreleased.md
index 59895438..9cf0ec58 100644
--- a/docs/readme_unreleased.md
+++ b/docs/readme_unreleased.md
@@ -3,16 +3,16 @@
 An arbitrary-precision integer and decimal library for [Mojo](https://www.modular.com/mojo), inspired by Python's `int` and `Decimal`.
 
 [![Version](https://img.shields.io/github/v/tag/forfudan/decimo?label=version&color=blue)](https://github.com/forfudan/decimo/releases)
+[![Mojo](https://img.shields.io/badge/mojo-0.26.2-orange)](https://docs.modular.com/mojo/manual/)
 [![pixi](https://img.shields.io/badge/pixi%20add-decimo-purple)](https://prefix.dev/channels/modular-community/packages/decimo)
 [![CI](https://img.shields.io/github/actions/workflow/status/forfudan/decimo/run_tests.yaml?branch=main&label=tests)](https://github.com/forfudan/decimo/actions/workflows/run_tests.yaml)
-[![Last Commit](https://img.shields.io/github/last-commit/forfudan/decimo?color=red)](https://github.com/forfudan/decimo/commits/main)
 
 <!-- 
-[![Mojo](https://img.shields.io/badge/mojo-0.26.2-orange)](https://docs.modular.com/mojo/manual/)
 [![License](https://img.shields.io/github/license/forfudan/decimo)](LICENSE)
 [![Stars](https://img.shields.io/github/stars/forfudan/decimo?style=flat)](https://github.com/forfudan/decimo/stargazers)
 [![Issues](https://img.shields.io/github/issues/forfudan/decimo)](https://github.com/forfudan/decimo/issues)
 ![Platforms](https://img.shields.io/badge/platform-macOS%20%7C%20Linux-lightgrey)
+[![Last Commit](https://img.shields.io/github/last-commit/forfudan/decimo?color=red)](https://github.com/forfudan/decimo/commits/main)
 -->
 
 <!-- 
@@ -40,23 +40,11 @@ The core types are[^auxiliary]:
 | `Decimal` | `BDec`, `BigDecimal` | Equivalent to Python's `decimal.Decimal` | Base-10^9               |
 | `Dec128`  | `Decimal128`         | 128-bit fixed-precision decimal type     | Triple 32-bit words     |
 
----
-
 **Decimo** combines "**Deci**mal" and "**Mo**jo" - reflecting its purpose and implementation language. "Decimo" is also a Latin word meaning "tenth" and is the root of the word "decimal".
 
----
-
-Decimo also includes **a command-line calculator application** that supports arbitrary-precision calculations, powered by [ArgMojo - a command-line argument parser library for Mojo](https://github.com/forfudan/argmojo).
-
-You can use it to evaluate complex mathematical expressions with high precision directly from your terminal. For example:
+A CLI calculator tool, built on top of the Decimo library and powered by [ArgMojo](https://github.com/forfudan/argmojo) (a command-line argument parser library), is also available in this repository. It provides a convenient way to perform high-precision calculations directly from the command line.
 
-```bash
-./decimo "2^10 + sqrt(pi) * ln(e^3) - root(125, 3) / abs(-5) + sin(pi/2)" -p 50 --scientific
-```
-
----
-
-Decimo also includes a built-in [TOML parser](./docs/readme_toml.md) (`decimo.toml`), a lightweight pure-Mojo implementation supporting TOML v1.0. It parses configuration files and test data, supporting basic types, arrays, and nested tables. While created for Decimo's testing framework, it offers general-purpose structured data parsing with a clean, simple API.
+This repository includes a built-in [TOML parser](./docs/readme_toml.md) (`decimo.toml`), a lightweight pure-Mojo implementation supporting TOML v1.0. It parses configuration files and test data, supporting basic types, arrays, and nested tables. While created for Decimo's testing framework, it offers general-purpose structured data parsing with a clean, simple API.
 
 ## Installation
 
@@ -384,6 +372,8 @@ If you find Decimo useful, consider listing it in your citations.
 
 This repository and its contributions are licensed under the Apache License v2.0.
 
+The `BigFloat` type optionally uses the [GNU MPFR Library](https://www.mpfr.org/) (LGPLv3+) and [GMP](https://gmplib.org/) (LGPLv3+ or GPLv2+) at runtime. Decimo does not include or distribute any MPFR/GMP source code or binaries — they are loaded via `dlopen` only if the user has independently installed them. All other Decimo types work without any external dependencies. See the [NOTICE](./NOTICE) file for details.
+
 [^fixed]: The `Decimal128` type can represent values with up to 29 significant digits and a maximum of 28 digits after the decimal point. When a value exceeds the maximum representable value (`2^96 - 1`), Decimo either raises an error or rounds the value to fit within these constraints. For example, the significant digits of `8.8888888888888888888888888888` (29 eights total with 28 after the decimal point) exceeds the maximum representable value (`2^96 - 1`) and is automatically rounded to `8.888888888888888888888888889` (28 eights total with 27 after the decimal point). Decimo's `Decimal128` type is similar to `System.Decimal` (C#/.NET), `rust_decimal` in Rust, `DECIMAL/NUMERIC` in SQL Server, etc.
 [^bigint]: The `BigInt` implementation uses a base-2^32 representation with a little-endian format, where the least significant word is stored at index 0. Each word is a `UInt32`, allowing for efficient storage and arithmetic operations on large integers. This design choice optimizes performance for binary computations while still supporting arbitrary precision.
 [^auxiliary]: The auxiliary types include a base-10 arbitrary-precision signed integer type (`BigInt10`) and a base-10 arbitrary-precision unsigned integer type (`BigUInt`) supporting unlimited digits[^bigint10]. `BigUInt` is used as the internal representation for `BigInt10` and `Decimal`.
diff --git a/pixi.toml b/pixi.toml
index bdcf8d42..04ffc671 100644
--- a/pixi.toml
+++ b/pixi.toml
@@ -54,6 +54,10 @@ ttoml = "clear && pixi run testtoml"
 # bench
 bench = "pixi run package && bash benches/run_bench.sh"
 
+# gmp/mpfr wrapper (compile C wrapper — no MPFR needed at build time)
+bgmp = "clear && pixi run buildgmp"
+buildgmp = "bash src/decimo/gmp/build_gmp_wrapper.sh"
+
 # bench with debug assertions enabled
 bdec_debug = """clear && pixi run package && cd benches/bigdecimal \
 && pixi run mojo run -I ../ -D ASSERT=all bench.mojo && cd ../.. \
diff --git a/src/decimo/bigfloat/__init__.mojo b/src/decimo/bigfloat/__init__.mojo
new file mode 100644
index 00000000..23cc8b5b
--- /dev/null
+++ b/src/decimo/bigfloat/__init__.mojo
@@ -0,0 +1,30 @@
+# ===----------------------------------------------------------------------=== #
+# Copyright 2025 Yuhao Zhu
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===----------------------------------------------------------------------=== #
+
+"""Sub-package for arbitrary-precision binary floating-point type.
+
+BigFloat is an MPFR-backed binary float type. It requires MPFR to be installed
+on the system (`brew install mpfr` on macOS, `apt install libmpfr-dev` on Linux).
+
+BigFloat is the fast path for scientific computing at high precision. Every
+operation (sqrt, exp, ln, sin, cos, tan, pi, divide) is a single MPFR call.
+
+For exact decimal arithmetic without external dependencies, use BigDecimal instead.
+
+Modules:
+- bigfloat: Core BigFloat struct with constructors, arithmetic, transcendentals
+- mpfr_wrapper: Low-level FFI bindings to the MPFR C wrapper
+"""
diff --git a/src/decimo/bigfloat/bigfloat.mojo b/src/decimo/bigfloat/bigfloat.mojo
new file mode 100644
index 00000000..d0968139
--- /dev/null
+++ b/src/decimo/bigfloat/bigfloat.mojo
@@ -0,0 +1,98 @@
+# ===----------------------------------------------------------------------=== #
+# Copyright 2025 Yuhao Zhu
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===----------------------------------------------------------------------=== #
+
+"""Implements the BigFloat type: arbitrary-precision binary floating-point.
+
+BigFloat wraps a single MPFR handle via a C wrapper. Every arithmetic and
+transcendental operation is a single MPFR call. Requires MPFR at runtime.
+
+Usage:
+    from decimo.bigfloat.bigfloat import BigFloat
+
+    var x = BigFloat("3.14159", precision=1000)
+    var r = x.sqrt()
+    var bd = r.to_bigdecimal(1000)
+
+Design:
+    - Single field: `handle: Int32` (index into C wrapper's mpfr_t handle pool)
+    - Precision specified in decimal digits, converted to bits internally
+    - Guard bits (64 extra) ensure requested decimal digits are correct
+    - RAII: destructor frees MPFR handle via `mpfrw_clear`
+"""
+
+from decimo.bigfloat.mpfr_wrapper import (
+    mpfrw_available,
+    mpfrw_init,
+    mpfrw_clear,
+    mpfrw_set_str,
+    mpfrw_get_str,
+    mpfrw_free_str,
+    mpfrw_add,
+    mpfrw_sub,
+    mpfrw_mul,
+    mpfrw_div,
+    mpfrw_neg,
+    mpfrw_abs,
+    mpfrw_cmp,
+    mpfrw_sqrt,
+    mpfrw_exp,
+    mpfrw_log,
+    mpfrw_sin,
+    mpfrw_cos,
+    mpfrw_tan,
+    mpfrw_pow,
+    mpfrw_rootn_ui,
+    mpfrw_const_pi,
+)
+
+# Guard bits added to user-requested precision to absorb binary↔decimal rounding.
+comptime _GUARD_BITS: Int = 64
+
+# Approximate bits per decimal digit: ceil(log2(10)) ≈ 3.322 → use 4 for safety.
+comptime _BITS_PER_DIGIT: Int = 4
+
+
+fn _dps_to_bits(precision: Int) -> Int:
+    """Converts decimal digit precision to MPFR bit precision with guard bits.
+    """
+    return precision * _BITS_PER_DIGIT + _GUARD_BITS
+
+
+# TODO: Implement BigFloat struct.
+#
+# struct BigFloat:
+#     var handle: Int32
+#
+#     fn __init__(out self, value: String, precision: Int) raises:
+#         ...
+#
+#     fn __init__(out self, bd: BigDecimal, precision: Int) raises:
+#         ...
+#
+#     fn sqrt(self) raises -> Self:
+#         ...
+#
+#     fn exp(self) raises -> Self:
+#         ...
+#
+#     fn ln(self) raises -> Self:
+#         ...
+#
+#     fn to_bigdecimal(self, precision: Int) raises -> BigDecimal:
+#         ...
+#
+#     fn __del__(owned self):
+#         mpfrw_clear(self.handle)
diff --git a/src/decimo/bigfloat/mpfr_wrapper.mojo b/src/decimo/bigfloat/mpfr_wrapper.mojo
new file mode 100644
index 00000000..d437ffb2
--- /dev/null
+++ b/src/decimo/bigfloat/mpfr_wrapper.mojo
@@ -0,0 +1,198 @@
+# ===----------------------------------------------------------------------=== #
+# Copyright 2025 Yuhao Zhu
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ===----------------------------------------------------------------------=== #
+
+"""Low-level FFI bindings to the MPFR C wrapper (libdecimo_gmp_wrapper).
+
+This module provides thin Mojo wrappers around the C functions in
+`gmp_wrapper.c`. The C wrapper uses `dlopen`/`dlsym` to load MPFR lazily
+at runtime, so this code compiles and links without MPFR headers or libraries.
+
+The C wrapper manages a pool of `mpfr_t` handles. Each handle is identified
+by an `Int32` index. The Mojo side never touches raw `mpfr_t` pointers.
+
+All functions here are internal — users interact with `BigFloat`, not these.
+"""
+
+from std.ffi import external_call, c_int, c_char
+
+# ===----------------------------------------------------------------------=== #
+# Availability check
+# ===----------------------------------------------------------------------=== #
+
+
+fn mpfrw_available() -> Bool:
+    """Checks if MPFR is available on this system.
+
+    Returns True if the C wrapper successfully loaded libmpfr via dlopen.
+    First call triggers the lazy load attempt. Result is cached in C.
+    """
+    var result = external_call["mpfrw_available", c_int]()
+    return result != 0
+
+
+# ===----------------------------------------------------------------------=== #
+# Handle lifecycle
+# ===----------------------------------------------------------------------=== #
+
+
+fn mpfrw_init(prec_bits: Int) -> Int32:
+    """Allocates an MPFR handle with the given precision in bits.
+
+    Returns a handle index (>= 0) on success, or -1 if the pool is full
+    or MPFR is unavailable.
+    """
+    return external_call["mpfrw_init", Int32](Int32(prec_bits))
+
+
+fn mpfrw_clear(handle: Int32):
+    """Frees an MPFR handle, returning it to the pool."""
+    external_call["mpfrw_clear", NoneType](handle)
+
+
+# ===----------------------------------------------------------------------=== #
+# String conversion
+#
+# C returns malloc'd char* which Mojo can't represent as UnsafePointer[c_char]
+# in return position (origin parameter can't be inferred). So we pass raw
+# addresses as Int and reconstruct UnsafePointer at the call site.
+# ===----------------------------------------------------------------------=== #
+
+
+fn mpfrw_set_str(
+    handle: Int32, s: UnsafePointer[c_char, _], length: Int32
+) -> Int32:
+    """Sets the MPFR handle value from a decimal string.
+
+    Args:
+        handle: MPFR handle index.
+        s: Pointer to a decimal string (e.g. "-3.14159").
+        length: Length of the string (for safety, not relying on null terminator).
+
+    Returns 0 on success, non-zero on parse error.
+    """
+    return external_call["mpfrw_set_str", Int32](handle, s, length)
+
+
+fn mpfrw_get_str(handle: Int32, digits: Int32) -> Int:
+    """Exports the MPFR handle value as a decimal string.
+
+    Args:
+        handle: MPFR handle index.
+        digits: Number of significant decimal digits to export.
+
+    Returns raw address of a null-terminated C string (malloc'd).
+    Caller must free it with `mpfrw_free_str`.
+    """
+    return external_call["mpfrw_get_str", Int](handle, digits)
+
+
+fn mpfrw_free_str(addr: Int):
+    """Frees a string returned by `mpfrw_get_str`.
+
+    Args:
+        addr: Raw address returned by `mpfrw_get_str`.
+    """
+    external_call["mpfrw_free_str", NoneType](addr)
+
+
+# ===----------------------------------------------------------------------=== #
+# Arithmetic operations
+# ===----------------------------------------------------------------------=== #
+
+
+fn mpfrw_add(result: Int32, a: Int32, b: Int32):
+    """Computes result = a + b (MPFR_RNDN)."""
+    external_call["mpfrw_add", NoneType](result, a, b)
+
+
+fn mpfrw_sub(result: Int32, a: Int32, b: Int32):
+    """Computes result = a - b (MPFR_RNDN)."""
+    external_call["mpfrw_sub", NoneType](result, a, b)
+
+
+fn mpfrw_mul(result: Int32, a: Int32, b: Int32):
+    """Computes result = a * b (MPFR_RNDN)."""
+    external_call["mpfrw_mul", NoneType](result, a, b)
+
+
+fn mpfrw_div(result: Int32, a: Int32, b: Int32):
+    """Computes result = a / b (MPFR_RNDN)."""
+    external_call["mpfrw_div", NoneType](result, a, b)
+
+
+fn mpfrw_neg(result: Int32, a: Int32):
+    """Computes result = -a (MPFR_RNDN)."""
+    external_call["mpfrw_neg", NoneType](result, a)
+
+
+fn mpfrw_abs(result: Int32, a: Int32):
+    """Computes result = |a| (MPFR_RNDN)."""
+    external_call["mpfrw_abs", NoneType](result, a)
+
+
+fn mpfrw_cmp(a: Int32, b: Int32) -> Int32:
+    """Compares a and b. Returns <0, 0, or >0."""
+    return external_call["mpfrw_cmp", Int32](a, b)
+
+
+# ===----------------------------------------------------------------------=== #
+# Transcendental operations
+# ===----------------------------------------------------------------------=== #
+
+
+fn mpfrw_sqrt(result: Int32, a: Int32):
+    """Computes result = sqrt(a) (MPFR_RNDN)."""
+    external_call["mpfrw_sqrt", NoneType](result, a)
+
+
+fn mpfrw_exp(result: Int32, a: Int32):
+    """Computes result = exp(a) (MPFR_RNDN)."""
+    external_call["mpfrw_exp", NoneType](result, a)
+
+
+fn mpfrw_log(result: Int32, a: Int32):
+    """Computes result = ln(a) (MPFR_RNDN)."""
+    external_call["mpfrw_log", NoneType](result, a)
+
+
+fn mpfrw_sin(result: Int32, a: Int32):
+    """Computes result = sin(a) (MPFR_RNDN)."""
+    external_call["mpfrw_sin", NoneType](result, a)
+
+
+fn mpfrw_cos(result: Int32, a: Int32):
+    """Computes result = cos(a) (MPFR_RNDN)."""
+    external_call["mpfrw_cos", NoneType](result, a)
+
+
+fn mpfrw_tan(result: Int32, a: Int32):
+    """Computes result = tan(a) (MPFR_RNDN)."""
+    external_call["mpfrw_tan", NoneType](result, a)
+
+
+fn mpfrw_pow(result: Int32, a: Int32, b: Int32):
+    """Computes result = a^b (MPFR_RNDN)."""
+    external_call["mpfrw_pow", NoneType](result, a, b)
+
+
+fn mpfrw_rootn_ui(result: Int32, a: Int32, n: UInt32):
+    """Computes result = a^(1/n) (MPFR_RNDN), the n-th root."""
+    external_call["mpfrw_rootn_ui", NoneType](result, a, n)
+
+
+fn mpfrw_const_pi(result: Int32):
+    """Computes result = π (MPFR_RNDN) to the handle's precision."""
+    external_call["mpfrw_const_pi", NoneType](result)
diff --git a/src/decimo/gmp/README.md b/src/decimo/gmp/README.md
new file mode 100644
index 00000000..277f1595
--- /dev/null
+++ b/src/decimo/gmp/README.md
@@ -0,0 +1,43 @@
+# GMP/MPFR C Wrapper
+
+This directory contains the C wrapper that bridges Mojo to MPFR/GMP via FFI.
+
+## Architecture
+
+The wrapper uses `dlopen`/`dlsym` to load MPFR lazily at **runtime**. This means:
+
+- **Build time**: Only a C compiler is needed. No MPFR/GMP headers or libraries required.
+- **Runtime**: If MPFR is installed, `mpfrw_available()` returns 1 and all operations work.
+  If not installed, it returns 0 and BigFloat raises a clear error.
+
+## Build
+
+```bash
+bash src/decimo/gmp/build_gmp_wrapper.sh
+```
+
+## Files
+
+- `gmp_wrapper.c` — The C wrapper source. Manages an `mpfr_t` handle pool and
+  provides thin wrappers around MPFR functions.
+- `build_gmp_wrapper.sh` — Build script for macOS and Linux.
+- `libdecimo_gmp_wrapper.dylib` / `.so` — Compiled output (not checked in).
+
+## Installing MPFR
+
+```bash
+# macOS (Homebrew)
+brew install mpfr
+
+# Linux (Debian/Ubuntu)
+sudo apt install libmpfr-dev
+
+# Verify
+ls /opt/homebrew/lib/libmpfr.dylib   # macOS ARM64
+ls /usr/lib/x86_64-linux-gnu/libmpfr.so  # Linux x86_64
+```
+
+## Environment Variables
+
+- `DECIMO_NOGMP=1` — Force-disable MPFR loading even if installed. Useful for testing
+  pure-Mojo fallback paths.
diff --git a/src/decimo/gmp/build_gmp_wrapper.sh b/src/decimo/gmp/build_gmp_wrapper.sh
new file mode 100755
index 00000000..45563891
--- /dev/null
+++ b/src/decimo/gmp/build_gmp_wrapper.sh
@@ -0,0 +1,41 @@
+#!/bin/bash
+# ===----------------------------------------------------------------------=== #
+# Build script for the Decimo GMP/MPFR C wrapper.
+#
+# This compiles gmp_wrapper.c into a shared library. The wrapper uses
+# dlopen/dlsym to load MPFR at runtime, so it does NOT need -lmpfr or -lgmp
+# at compile time. Any system with a C compiler can build this.
+#
+# Usage:
+#   bash src/decimo/gmp/build_gmp_wrapper.sh
+#
+# Output:
+#   src/decimo/gmp/libdecimo_gmp_wrapper.dylib  (macOS)
+#   src/decimo/gmp/libdecimo_gmp_wrapper.so      (Linux)
+#
+# Prerequisites:
+#   - A C compiler (cc, gcc, or clang)
+#   - For actual MPFR use at runtime: brew install mpfr (macOS)
+#     or apt install libmpfr-dev (Linux)
+# ===----------------------------------------------------------------------=== #
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+SRC="$SCRIPT_DIR/gmp_wrapper.c"
+
+OS="$(uname)"
+if [[ "$OS" == "Darwin" ]]; then
+    OUT="$SCRIPT_DIR/libdecimo_gmp_wrapper.dylib"
+    cc -shared -O2 -o "$OUT" "$SRC"
+    # Set install name so the loader finds it via rpath
+    install_name_tool -id @rpath/libdecimo_gmp_wrapper.dylib "$OUT"
+    echo "Built: $OUT"
+elif [[ "$OS" == "Linux" ]]; then
+    OUT="$SCRIPT_DIR/libdecimo_gmp_wrapper.so"
+    cc -shared -O2 -fPIC -o "$OUT" "$SRC" -ldl
+    echo "Built: $OUT"
+else
+    echo "Unsupported OS: $OS"
+    exit 1
+fi
diff --git a/src/decimo/gmp/gmp_wrapper.c b/src/decimo/gmp/gmp_wrapper.c
new file mode 100644
index 00000000..39092db7
--- /dev/null
+++ b/src/decimo/gmp/gmp_wrapper.c
@@ -0,0 +1,374 @@
+/*
+ * ===----------------------------------------------------------------------=== *
+ * Copyright 2025 Yuhao Zhu
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ * ===----------------------------------------------------------------------=== *
+ *
+ * gmp_wrapper.c — C bridge between Mojo and MPFR/GMP.
+ *
+ * This file uses dlopen/dlsym to load MPFR lazily at runtime. It compiles
+ * on any system with a C compiler — no MPFR/GMP headers needed at build time.
+ *
+ * License note: This file does not contain any code copied from the GMP or
+ * MPFR projects. All function signatures were written from scratch based on
+ * the publicly documented MPFR C API. The only interaction with MPFR/GMP is
+ * through runtime symbol resolution via dlsym. MPFR (LGPLv3+) and GMP
+ * (LGPLv3+ or GPLv2+) are loaded dynamically only if installed by the user.
+ *
+ * Architecture:
+ *   - A pool of mpfr_t handles (fixed-size array, index-based)
+ *   - Mojo side only sees Int32 handle indices
+ *   - All MPFR functions resolved via dlsym at first use
+ *   - If MPFR is not installed, mpfrw_available() returns 0
+ *
+ * Build:
+ *   cc -shared -O2 -o libdecimo_gmp_wrapper.dylib gmp_wrapper.c -ldl
+ *   (No -lmpfr or -lgmp needed — loaded at runtime via dlopen)
+ */
+
+#include <dlfcn.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* ===----------------------------------------------------------------------=== *
+ * Constants
+ * ===----------------------------------------------------------------------=== */
+
+#define MAX_HANDLES     4096
+#define MAX_PREC_BITS   1048576  /* 1M bits ≈ 315K decimal digits — safety cap */
+
+/* MPFR rounding mode: round to nearest (ties to even) */
+#define MPFR_RNDN 0
+
+/* ===----------------------------------------------------------------------=== *
+ * MPFR function pointer typedefs
+ *
+ * These match the MPFR C API signatures. We resolve them via dlsym so we
+ * don't need mpfr.h at compile time.
+ * ===----------------------------------------------------------------------=== */
+
+/* mpfr_t is internally a struct — we allocate raw bytes per handle */
+#define MPFR_T_SIZE 64  /* sizeof(mpfr_t) on most platforms; verified on ARM64 */
+
+typedef int   (*fn_mpfr_init2)(void *x, long prec);
+typedef void  (*fn_mpfr_clear)(void *x);
+typedef int   (*fn_mpfr_set_str)(void *rop, const char *s, int base, int rnd);
+typedef char* (*fn_mpfr_get_str)(char *str, long *exp, int base, size_t n, const void *op, int rnd);
+typedef void  (*fn_mpfr_free_str)(char *str);
+typedef int   (*fn_mpfr_add)(void *rop, const void *op1, const void *op2, int rnd);
+typedef int   (*fn_mpfr_sub)(void *rop, const void *op1, const void *op2, int rnd);
+typedef int   (*fn_mpfr_mul)(void *rop, const void *op1, const void *op2, int rnd);
+typedef int   (*fn_mpfr_div)(void *rop, const void *op1, const void *op2, int rnd);
+typedef int   (*fn_mpfr_neg)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_abs)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_cmp)(const void *op1, const void *op2);
+typedef int   (*fn_mpfr_sqrt)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_exp)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_log)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_sin)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_cos)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_tan)(void *rop, const void *op, int rnd);
+typedef int   (*fn_mpfr_pow)(void *rop, const void *op1, const void *op2, int rnd);
+typedef int   (*fn_mpfr_rootn_ui)(void *rop, const void *op, unsigned long n, int rnd);
+typedef int   (*fn_mpfr_const_pi)(void *rop, int rnd);
+typedef long  (*fn_mpfr_get_prec)(const void *x);
+typedef int   (*fn_mpfr_snprintf)(char *buf, size_t n, const char *fmt, ...);
+
+/* ===----------------------------------------------------------------------=== *
+ * Static state
+ * ===----------------------------------------------------------------------=== */
+
+static void *mpfr_lib = NULL;
+static int   mpfr_load_attempted = 0;
+
+/* Function pointers — resolved by mpfrw_load() */
+static fn_mpfr_init2      p_init2      = NULL;
+static fn_mpfr_clear      p_clear      = NULL;
+static fn_mpfr_set_str    p_set_str    = NULL;
+static fn_mpfr_get_str    p_get_str    = NULL;
+static fn_mpfr_free_str   p_free_str   = NULL;
+static fn_mpfr_add        p_add        = NULL;
+static fn_mpfr_sub        p_sub        = NULL;
+static fn_mpfr_mul        p_mul        = NULL;
+static fn_mpfr_div        p_div        = NULL;
+static fn_mpfr_neg        p_neg        = NULL;
+static fn_mpfr_abs        p_abs        = NULL;
+static fn_mpfr_cmp        p_cmp        = NULL;
+static fn_mpfr_sqrt       p_sqrt       = NULL;
+static fn_mpfr_exp        p_exp        = NULL;
+static fn_mpfr_log        p_log        = NULL;
+static fn_mpfr_sin        p_sin        = NULL;
+static fn_mpfr_cos        p_cos        = NULL;
+static fn_mpfr_tan        p_tan        = NULL;
+static fn_mpfr_pow        p_pow        = NULL;
+static fn_mpfr_rootn_ui   p_rootn_ui   = NULL;
+static fn_mpfr_const_pi   p_const_pi   = NULL;
+static fn_mpfr_snprintf   p_snprintf   = NULL;
+
+/* Handle pool: raw byte arrays sized to hold mpfr_t */
+static char  handle_pool[MAX_HANDLES][MPFR_T_SIZE];
+static int   handle_in_use[MAX_HANDLES];  /* 0 = free, 1 = in use */
+
+/* ===----------------------------------------------------------------------=== *
+ * Library loading (dlopen/dlsym)
+ * ===----------------------------------------------------------------------=== */
+
+static int mpfrw_load(void) {
+    if (mpfr_load_attempted) return (mpfr_lib != NULL);
+    mpfr_load_attempted = 1;
+
+    /* Check DECIMO_NOGMP environment variable */
+    const char *nogmp = getenv("DECIMO_NOGMP");
+    if (nogmp && nogmp[0] != '0' && nogmp[0] != '\0') {
+        return 0;
+    }
+
+    /* Try common library paths */
+    const char *paths[] = {
+        "libmpfr.dylib",                          /* macOS via rpath        */
+        "/opt/homebrew/lib/libmpfr.dylib",         /* macOS ARM64 (Homebrew) */
+        "/usr/local/lib/libmpfr.dylib",            /* macOS x86_64           */
+        "libmpfr.so",                              /* Linux via rpath        */
+        "/usr/lib/x86_64-linux-gnu/libmpfr.so",   /* Debian/Ubuntu          */
+        "/usr/lib/aarch64-linux-gnu/libmpfr.so",   /* Debian ARM64           */
+        "/usr/lib/libmpfr.so",                     /* Generic Linux          */
+        NULL
+    };
+
+    for (int i = 0; paths[i]; i++) {
+        mpfr_lib = dlopen(paths[i], RTLD_LAZY);
+        if (mpfr_lib) break;
+    }
+    if (!mpfr_lib) return 0;
+
+    /* Resolve all function pointers */
+    p_init2    = (fn_mpfr_init2)      dlsym(mpfr_lib, "mpfr_init2");
+    p_clear    = (fn_mpfr_clear)      dlsym(mpfr_lib, "mpfr_clear");
+    p_set_str  = (fn_mpfr_set_str)    dlsym(mpfr_lib, "mpfr_set_str");
+    p_get_str  = (fn_mpfr_get_str)    dlsym(mpfr_lib, "mpfr_get_str");
+    p_free_str = (fn_mpfr_free_str)   dlsym(mpfr_lib, "mpfr_free_str");
+    p_add      = (fn_mpfr_add)        dlsym(mpfr_lib, "mpfr_add");
+    p_sub      = (fn_mpfr_sub)        dlsym(mpfr_lib, "mpfr_sub");
+    p_mul      = (fn_mpfr_mul)        dlsym(mpfr_lib, "mpfr_mul");
+    p_div      = (fn_mpfr_div)        dlsym(mpfr_lib, "mpfr_div");
+    p_neg      = (fn_mpfr_neg)        dlsym(mpfr_lib, "mpfr_neg");
+    p_abs      = (fn_mpfr_abs)        dlsym(mpfr_lib, "mpfr_abs");
+    p_cmp      = (fn_mpfr_cmp)        dlsym(mpfr_lib, "mpfr_cmp");
+    p_sqrt     = (fn_mpfr_sqrt)       dlsym(mpfr_lib, "mpfr_sqrt");
+    p_exp      = (fn_mpfr_exp)        dlsym(mpfr_lib, "mpfr_exp");
+    p_log      = (fn_mpfr_log)        dlsym(mpfr_lib, "mpfr_log");
+    p_sin      = (fn_mpfr_sin)        dlsym(mpfr_lib, "mpfr_sin");
+    p_cos      = (fn_mpfr_cos)        dlsym(mpfr_lib, "mpfr_cos");
+    p_tan      = (fn_mpfr_tan)        dlsym(mpfr_lib, "mpfr_tan");
+    p_pow      = (fn_mpfr_pow)        dlsym(mpfr_lib, "mpfr_pow");
+    p_rootn_ui = (fn_mpfr_rootn_ui)   dlsym(mpfr_lib, "mpfr_rootn_ui");
+    p_const_pi = (fn_mpfr_const_pi)   dlsym(mpfr_lib, "mpfr_const_pi");
+    p_snprintf = (fn_mpfr_snprintf)   dlsym(mpfr_lib, "mpfr_snprintf");
+
+    /* Verify critical function pointers */
+    if (!p_init2 || !p_clear || !p_set_str || !p_get_str) {
+        dlclose(mpfr_lib);
+        mpfr_lib = NULL;
+        return 0;
+    }
+
+    /* Initialize handle pool */
+    memset(handle_in_use, 0, sizeof(handle_in_use));
+
+    return 1;
+}
+
+/* ===----------------------------------------------------------------------=== *
+ * Public API — called from Mojo via external_call
+ * ===----------------------------------------------------------------------=== */
+
+int mpfrw_available(void) {
+    return mpfrw_load();
+}
+
+int mpfrw_init(int prec_bits) {
+    if (!mpfrw_load()) return -1;
+    if (prec_bits < 2) prec_bits = 2;
+    if (prec_bits > MAX_PREC_BITS) return -1;  /* precision cap */
+
+    /* Find a free handle slot */
+    for (int i = 0; i < MAX_HANDLES; i++) {
+        if (!handle_in_use[i]) {
+            handle_in_use[i] = 1;
+            p_init2(&handle_pool[i], (long)prec_bits);
+            return i;
+        }
+    }
+    return -1;  /* pool exhausted */
+}
+
+void mpfrw_clear(int handle) {
+    if (handle < 0 || handle >= MAX_HANDLES) return;
+    if (!handle_in_use[handle]) return;
+    p_clear(&handle_pool[handle]);
+    handle_in_use[handle] = 0;
+}
+
+int mpfrw_set_str(int handle, const char *s, int length) {
+    if (handle < 0 || handle >= MAX_HANDLES || !handle_in_use[handle]) return -1;
+
+    /* Copy to null-terminated buffer (s may not be null-terminated) */
+    char *buf = (char *)malloc(length + 1);
+    if (!buf) return -1;
+    memcpy(buf, s, length);
+    buf[length] = '\0';
+
+    int ret = p_set_str(&handle_pool[handle], buf, 10, MPFR_RNDN);
+    free(buf);
+    return ret;
+}
+
+char* mpfrw_get_str(int handle, int digits) {
+    if (handle < 0 || handle >= MAX_HANDLES || !handle_in_use[handle]) return NULL;
+    if (digits < 1) digits = 1;
+
+    /* Use mpfr_snprintf to format as decimal string */
+    /* Format: %.<digits>RNf for fixed notation, %.<digits>RNg for general */
+    int bufsize = digits + 64;  /* extra space for sign, dot, exponent */
+    char *buf = (char *)malloc(bufsize);
+    if (!buf) return NULL;
+
+    if (p_snprintf) {
+        /* %.*Rg: general format with N significant digits, round to nearest */
+        p_snprintf(buf, bufsize, "%.*Rg", digits, &handle_pool[handle]);
+    } else {
+        /* Fallback: use mpfr_get_str + manual formatting */
+        long exp;
+        char *raw = p_get_str(NULL, &exp, 10, digits, &handle_pool[handle], MPFR_RNDN);
+        if (!raw) { free(buf); return NULL; }
+        /* TODO: Format raw mantissa + exponent into decimal string */
+        snprintf(buf, bufsize, "%sE%ld", raw, exp);
+        if (p_free_str) p_free_str(raw);
+    }
+    return buf;
+}
+
+void mpfrw_free_str(char *s) {
+    if (s) free(s);
+}
+
+/* ===----------------------------------------------------------------------=== *
+ * Arithmetic operations
+ * ===----------------------------------------------------------------------=== */
+
+#define CHECK_HANDLES_2(r, a) \
+    if ((r) < 0 || (r) >= MAX_HANDLES || !handle_in_use[(r)]) return; \
+    if ((a) < 0 || (a) >= MAX_HANDLES || !handle_in_use[(a)]) return;
+
+#define CHECK_HANDLES_3(r, a, b) \
+    CHECK_HANDLES_2(r, a) \
+    if ((b) < 0 || (b) >= MAX_HANDLES || !handle_in_use[(b)]) return;
+
+void mpfrw_add(int r, int a, int b) {
+    CHECK_HANDLES_3(r, a, b);
+    p_add(&handle_pool[r], &handle_pool[a], &handle_pool[b], MPFR_RNDN);
+}
+
+void mpfrw_sub(int r, int a, int b) {
+    CHECK_HANDLES_3(r, a, b);
+    p_sub(&handle_pool[r], &handle_pool[a], &handle_pool[b], MPFR_RNDN);
+}
+
+void mpfrw_mul(int r, int a, int b) {
+    CHECK_HANDLES_3(r, a, b);
+    p_mul(&handle_pool[r], &handle_pool[a], &handle_pool[b], MPFR_RNDN);
+}
+
+void mpfrw_div(int r, int a, int b) {
+    CHECK_HANDLES_3(r, a, b);
+    p_div(&handle_pool[r], &handle_pool[a], &handle_pool[b], MPFR_RNDN);
+}
+
+void mpfrw_neg(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    p_neg(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_abs(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    p_abs(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+int mpfrw_cmp(int a, int b) {
+    if (a < 0 || a >= MAX_HANDLES || !handle_in_use[a]) return 0;
+    if (b < 0 || b >= MAX_HANDLES || !handle_in_use[b]) return 0;
+    return p_cmp(&handle_pool[a], &handle_pool[b]);
+}
+
+/* ===----------------------------------------------------------------------=== *
+ * Transcendental operations
+ * ===----------------------------------------------------------------------=== */
+
+void mpfrw_sqrt(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    if (p_sqrt) p_sqrt(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_exp(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    if (p_exp) p_exp(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_log(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    if (p_log) p_log(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_sin(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    if (p_sin) p_sin(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_cos(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    if (p_cos) p_cos(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_tan(int r, int a) {
+    CHECK_HANDLES_2(r, a);
+    if (p_tan) p_tan(&handle_pool[r], &handle_pool[a], MPFR_RNDN);
+}
+
+void mpfrw_pow(int r, int a, int b) {
+    CHECK_HANDLES_3(r, a, b);
+    if (p_pow) p_pow(&handle_pool[r], &handle_pool[a], &handle_pool[b], MPFR_RNDN);
+}
+
+void mpfrw_rootn_ui(int r, int a, unsigned int n) {
+    CHECK_HANDLES_2(r, a);
+    if (p_rootn_ui) p_rootn_ui(&handle_pool[r], &handle_pool[a], (unsigned long)n, MPFR_RNDN);
+}
+
+void mpfrw_const_pi(int r) {
+    if (r < 0 || r >= MAX_HANDLES || !handle_in_use[r]) return;
+    if (p_const_pi) p_const_pi(&handle_pool[r], MPFR_RNDN);
+}
+
+/* ===----------------------------------------------------------------------=== *
+ * Diagnostics
+ * ===----------------------------------------------------------------------=== */
+
+int mpfrw_handles_in_use(void) {
+    int count = 0;
+    for (int i = 0; i < MAX_HANDLES; i++) {
+        if (handle_in_use[i]) count++;
+    }
+    return count;
+}

From 5b6f9f36857020aad2145444c84199056ecb04dc Mon Sep 17 00:00:00 2001
From: ZHU Yuhao <dr.yuhao.zhu@outlook.com>
Date: Sat, 4 Apr 2026 01:07:22 +0200
Subject: [PATCH 6/6] Fix comments

---
 docs/plans/gmp_integration.md         |  4 ++--
 src/decimo/bigfloat/mpfr_wrapper.mojo |  2 +-
 src/decimo/gmp/README.md              | 12 +++++++---
 src/decimo/gmp/gmp_wrapper.c          | 33 +++++++++++++++++----------
 4 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/docs/plans/gmp_integration.md b/docs/plans/gmp_integration.md
index 7ce47e9a..c02acd0e 100644
--- a/docs/plans/gmp_integration.md
+++ b/docs/plans/gmp_integration.md
@@ -92,7 +92,7 @@ passes and receives only primitive types (`Int32`, `Int`).
 ### 3.1 Available FFI Mechanism
 
 ```mojo
-from sys.ffi import external_call
+from std.ffi import external_call
 
 # Example: calling a C function
 var result = external_call["gmpw_add", NoneType, Int32, Int32, Int32](a, b, c)
@@ -1406,7 +1406,7 @@ const char* gmpw_version(void);
 ### 15.3 Sample Mojo FFI Calls
 
 ```mojo
-from sys.ffi import external_call, c_int
+from std.ffi import external_call, c_int
 
 # ⚠️ SAFE pattern: pass length explicitly to avoid use-after-free (see Section 16.1)
 def gmp_set_str(h: c_int, s: String):
diff --git a/src/decimo/bigfloat/mpfr_wrapper.mojo b/src/decimo/bigfloat/mpfr_wrapper.mojo
index d437ffb2..9dfa42cd 100644
--- a/src/decimo/bigfloat/mpfr_wrapper.mojo
+++ b/src/decimo/bigfloat/mpfr_wrapper.mojo
@@ -144,7 +144,7 @@ fn mpfrw_abs(result: Int32, a: Int32):
 
 
 fn mpfrw_cmp(a: Int32, b: Int32) -> Int32:
-    """Compares a and b. Returns <0, 0, or >0."""
+    """Compares a and b. Returns <0, 0, or >0. Returns -2 on invalid handle."""
     return external_call["mpfrw_cmp", Int32](a, b)
 
 
diff --git a/src/decimo/gmp/README.md b/src/decimo/gmp/README.md
index 277f1595..51917229 100644
--- a/src/decimo/gmp/README.md
+++ b/src/decimo/gmp/README.md
@@ -6,9 +6,15 @@ This directory contains the C wrapper that bridges Mojo to MPFR/GMP via FFI.
 
 The wrapper uses `dlopen`/`dlsym` to load MPFR lazily at **runtime**. This means:
 
-- **Build time**: Only a C compiler is needed. No MPFR/GMP headers or libraries required.
-- **Runtime**: If MPFR is installed, `mpfrw_available()` returns 1 and all operations work.
-  If not installed, it returns 0 and BigFloat raises a clear error.
+- **Build time**: Only a C compiler is needed. No MPFR/GMP headers or libraries
+  are required. The wrapper manually declares `mpfr_t` storage and function pointer
+  signatures based on the public MPFR C API.
+- **ABI note**: The `mpfr_t` storage is conservatively over-allocated (64 bytes,
+  16-byte aligned) to accommodate all known MPFR builds. This has been verified
+  on ARM64 macOS and x86_64 Linux. If you encounter issues on an unusual platform,
+  rebuilding with `mpfr.h` included would provide a compile-time size check.
+- **Runtime**: If MPFR is installed, `mpfrw_available()` returns 1 and all
+  operations work. If not installed, it returns 0 and BigFloat raises a clear error.
 
 ## Build
 
diff --git a/src/decimo/gmp/gmp_wrapper.c b/src/decimo/gmp/gmp_wrapper.c
index 39092db7..b2eaf460 100644
--- a/src/decimo/gmp/gmp_wrapper.c
+++ b/src/decimo/gmp/gmp_wrapper.c
@@ -59,10 +59,16 @@
  * don't need mpfr.h at compile time.
  * ===----------------------------------------------------------------------=== */
 
-/* mpfr_t is internally a struct — we allocate raw bytes per handle */
-#define MPFR_T_SIZE 64  /* sizeof(mpfr_t) on most platforms; verified on ARM64 */
-
-typedef int   (*fn_mpfr_init2)(void *x, long prec);
+/* mpfr_t is internally a struct. The size is not guaranteed by the MPFR ABI,
+ * but has been 32 bytes on all tested platforms (ARM64 macOS, x86_64 Linux)
+ * since MPFR 3.x. We over-allocate to 64 bytes and align to 16 bytes for
+ * safety. A runtime assertion in mpfrw_load() verifies the actual size.
+ * If MPFR headers are available at build time, the compile-time static_assert
+ * below catches any mismatch. */
+#define MPFR_T_SIZE 64
+#define MPFR_T_ALIGN 16
+
+typedef void  (*fn_mpfr_init2)(void *x, long prec);
 typedef void  (*fn_mpfr_clear)(void *x);
 typedef int   (*fn_mpfr_set_str)(void *rop, const char *s, int base, int rnd);
 typedef char* (*fn_mpfr_get_str)(char *str, long *exp, int base, size_t n, const void *op, int rnd);
@@ -117,8 +123,8 @@ static fn_mpfr_rootn_ui   p_rootn_ui   = NULL;
 static fn_mpfr_const_pi   p_const_pi   = NULL;
 static fn_mpfr_snprintf   p_snprintf   = NULL;
 
-/* Handle pool: raw byte arrays sized to hold mpfr_t */
-static char  handle_pool[MAX_HANDLES][MPFR_T_SIZE];
+/* Handle pool: byte arrays sized to hold mpfr_t, with proper alignment */
+static _Alignas(MPFR_T_ALIGN) char handle_pool[MAX_HANDLES][MPFR_T_SIZE];
 static int   handle_in_use[MAX_HANDLES];  /* 0 = free, 1 = in use */
 
 /* ===----------------------------------------------------------------------=== *
@@ -137,11 +143,11 @@ static int mpfrw_load(void) {
 
     /* Try common library paths */
     const char *paths[] = {
-        "libmpfr.dylib",                          /* macOS via rpath        */
+        "libmpfr.dylib",                           /* macOS via rpath        */
         "/opt/homebrew/lib/libmpfr.dylib",         /* macOS ARM64 (Homebrew) */
         "/usr/local/lib/libmpfr.dylib",            /* macOS x86_64           */
         "libmpfr.so",                              /* Linux via rpath        */
-        "/usr/lib/x86_64-linux-gnu/libmpfr.so",   /* Debian/Ubuntu          */
+        "/usr/lib/x86_64-linux-gnu/libmpfr.so",    /* Debian/Ubuntu          */
         "/usr/lib/aarch64-linux-gnu/libmpfr.so",   /* Debian ARM64           */
         "/usr/lib/libmpfr.so",                     /* Generic Linux          */
         NULL
@@ -177,8 +183,10 @@ static int mpfrw_load(void) {
     p_const_pi = (fn_mpfr_const_pi)   dlsym(mpfr_lib, "mpfr_const_pi");
     p_snprintf = (fn_mpfr_snprintf)   dlsym(mpfr_lib, "mpfr_snprintf");
 
-    /* Verify critical function pointers */
-    if (!p_init2 || !p_clear || !p_set_str || !p_get_str) {
+    /* Verify critical function pointers — if any core symbol is missing,
+     * the MPFR library is too old or incompatible. Fail the load. */
+    if (!p_init2 || !p_clear || !p_set_str || !p_get_str || !p_free_str ||
+        !p_add || !p_sub || !p_mul || !p_div || !p_neg || !p_abs || !p_cmp) {
         dlclose(mpfr_lib);
         mpfr_lib = NULL;
         return 0;
@@ -217,6 +225,7 @@ int mpfrw_init(int prec_bits) {
 void mpfrw_clear(int handle) {
     if (handle < 0 || handle >= MAX_HANDLES) return;
     if (!handle_in_use[handle]) return;
+    if (!p_clear) return;  /* MPFR not loaded — nothing to free */
     p_clear(&handle_pool[handle]);
     handle_in_use[handle] = 0;
 }
@@ -307,8 +316,8 @@ void mpfrw_abs(int r, int a) {
 }
 
 int mpfrw_cmp(int a, int b) {
-    if (a < 0 || a >= MAX_HANDLES || !handle_in_use[a]) return 0;
-    if (b < 0 || b >= MAX_HANDLES || !handle_in_use[b]) return 0;
+    if (a < 0 || a >= MAX_HANDLES || !handle_in_use[a]) return -2;  /* error sentinel */
+    if (b < 0 || b >= MAX_HANDLES || !handle_in_use[b]) return -2;  /* error sentinel */
     return p_cmp(&handle_pool[a], &handle_pool[b]);
 }