diff --git a/ROBOTS.md b/ROBOTS.md new file mode 100644 index 000000000..ed98de71e --- /dev/null +++ b/ROBOTS.md @@ -0,0 +1,216 @@ +# ROBOTS.md — Python Generics for Cap’n Proto (pycapnp enhancement) + +## 1) Mission + +Implement **first-class Python generics for Cap’n Proto schema generics** in `pycapnp` via a new codegen path and a tiny runtime. Developers should be able to use: + +```py +from my_schema.location_generic import Location +loc = Location[str]("NYC", 40.71, -74.0) +loc2 = Location[int](2**63, 51.5, -0.12) +``` + +…with accurate type hints, minimal overhead, and **no schema changes**. + +## 2) Goals (Must) + +* Generate **`typing.Generic[...]` classes** for Cap’n Proto generic structs/interfaces. +* Hide the **pointer-parameter constraint** behind a **codec registry** (auto wrap/unbox scalars and custom types). +* Ship **PEP 561 stubs** so mypy/pyright infer `Location[T].id: T`. +* Preserve wire compatibility and the existing raw API. +* Keep overhead negligible vs current `pycapnp` (<2% in simple construct+serialize microbenchmarks). + +## 3) Non-goals (Won’t for v1) + +* No HKTs (higher-kinded types). +* No cross-module generic unification beyond explicit codec registration. +* No changes to Cap’n Proto core or schema language. + +## 4) Outputs / Deliverables + +1. **Runtime module**: `capnp/_generic.py` (Codec, registry, helpers). +2. **Codegen templates** (Jinja or Python format strings): + + * `*_generic.py` wrapper per generic type. + * `*_generic.pyi` stubs per generic type. + * `*_codecs.py` per schema module (default codec registrations when wrappers exist). +3. **Compiler flag**: `--python-generics={off|stubs|full}`; default `off`. +4. **Docs**: new “Python generics” page + migration notes. +5. **Tests**: unit, typing, and microbenchmarks. + +## 5) High-level Design + +### 5.1 Runtime (`capnp/_generic.py`) + +* `Codec(capnp_type, to_capnp, from_capnp)`. +* `_registry[(schema_module, py_type)] -> Codec` with MRO fallback. +* API: `register_codec(schema_module, py_type, codec)` and `find_codec(schema_module, py_type)`. + +### 5.2 Generated Wrapper (per generic type) + +Given schema `Foo(P1, P2, ...)`, emit `foo_generic.py` exposing: + +```py +P1 = TypeVar("P1"); P2 = TypeVar("P2") +class Foo(Generic[P1, P2]): + def __init__(self, a1: P1, a2: P2, ...): + codec1 = find_codec(_mod, type(a1)) + codec2 = find_codec(_mod, type(a2)) + impl = _mod.Foo[codec1.capnp_type, codec2.capnp_type] + # construct message, set fields via codecs +``` + +* `from_bytes(cls, data: bytes, py_types: tuple[type,...] | Type[P1]...)` loads the **concrete** instantiation and returns a typed wrapper. +* Properties unwrap via `from_capnp`. + +### 5.3 Type Stubs (PEP 561) + +* Emit `.pyi` alongside wrappers with `Generic[...]` and precise field types. +* Provide overloads for common `T` (str/int/bytes/uuid.UUID) to improve inference. + +### 5.4 Default Codecs + +Emit module-level registrations in `*_codecs.py` (imported by `*_generic.py`): + +| Python type | Cap’n pointer type | Wrapper required | to_capnp | from_capnp | +| ----------- | --------------------- | ---------------- | ------------------- | --------------------- | +| `str` | `Text` | No | identity | `str(r)` | +| `bytes` | `Data` via `BytesRef` | Yes | new wrapper message | `bytes(r.value)` | +| `int` | `Number64` | Yes | mask → wrapper | `int(r.value)` | +| `uuid.UUID` | `Uuid128` | Yes | split → wrapper | merge → `UUID(int=…)` | + +> Emit only when the schema module defines the necessary wrapper structs. + +## 6) Codegen Integration + +* Extend the Python backend of `capnp compile -opython` to: + + 1. Parse the AST: detect generic types and their parameters. + 2. Emit raw module as today (e.g., `location_capnp.py`). + 3. If `--python-generics=stubs|full`, render `*_generic.(py|pyi)`. + 4. If `--python-generics=full`, render `*_codecs.py` and autoload it from the wrapper. +* Naming: place wrappers next to raw module; suffix `_generic` to avoid import cycles. + +## 7) Backward Compatibility + +* No behavioral change when flag is `off` (default). +* Existing imports continue to work (`import location_capnp as capnp`). +* New API is additive: `from my_schema.location_generic import Location`. + +## 8) CLI Flags & Behavior + +* `--python-generics=off|stubs|full`. +* `--python-generics-scalar=wrap|error` (controls default codec emission for scalars). +* Future: `--python-generics-anypointer` to generate `AnyPointer`-based dynamic wrappers (out-of-scope for v1). + +## 9) Testing Plan + +### 9.1 Unit + +* Round-trip for `str`, big `int (>= 2**63)`, `bytes`, `uuid.UUID`. +* Multi-parameter generic: `Pair(A,B)`. +* Missing codec → `TypeError` with helpful message. +* Wrapper-free path (Text/Data) remains zero-copy. + +### 9.2 Typing + +* `mypy` + `pyright` tests: `reveal_type(Location[str](...).id) -> builtins.str` etc. +* Overload resolution for `from_bytes(blob, str)`. + +### 9.3 Benchmarks (pytest-benchmark) + +* Construct + set + `to_bytes_packed` for raw vs generic wrapper. +* Target: ≤2% overhead median. + +## 10) Repository Layout Changes + +``` +pycapnp/ + _generic.py # NEW runtime + codegen/ + templates/ + generic_wrapper.py.j2 + generic_wrapper.pyi.j2 + module_codecs.py.j2 + ... existing files ... +``` + +## 11) Acceptance Criteria + +* ✅ Flagged build emits wrappers/stubs for generic schemas. +* ✅ `pip install` publishes `py.typed` and stubs. +* ✅ Typing tests pass on CPython 3.10–3.13. +* ✅ Benchmarks meet perf budget. +* ✅ Docs page with end-to-end example and codec recipe. + +## 12) Rollout Plan + +1. Land runtime + codegen behind `--python-generics=stubs`. +2. Add default codecs + `full` mode. +3. Publish pre-release (`0.x`) to gather feedback. +4. Stabilize and document integration patterns (dataclasses/Pydantic codecs). + +## 13) Risks & Mitigations + +* **Codec explosion**: encourage per-module codecs; document conventions. +* **Unsigned semantics**: recommend `bigint` for JS/TS consumers; in Python, `int` is unbounded. +* **User surprise on Any**: be explicit that *this is not AnyPointer*; codecs maintain type safety. + +## 14) Developer Guide (How to) + +1. Build local: `pip install -e .` (dev) and ensure `capnp` compiler on PATH. +2. Compile schema: `capnp compile -opython --python-generics=full schema/location.capnp`. +3. Use in app: + + ```py + from my_schema.location_generic import Location + a = Location[str]("NYC", 40.71, -74.0) + b = Location[int](2**63, 51.5, -0.12) + ``` +4. Register custom type: + + ```py + from my_schema._codecs import register_codec, Codec + from my_schema import location_capnp as mod + class OrderId: ... + register_codec(mod, OrderId, Codec(mod.Number64, + lambda o: mod.Number64.new_message(value=o.n), + lambda r: OrderId(int(r.value)))) + ``` + +## 15) Example Schema for CI tests + +```capnp +@0x1a2b3c4d1a2b3c4d; +struct Location(Id) { id @0 :Id; lat @1 :Float64; lon @2 :Float64; } +struct Number64 { value @0 :UInt64; } +struct BytesRef { value @0 :Data; } +struct Uuid128 { hi @0 :UInt64; lo @1 :UInt64; } +struct Pair(A, B) { a @0 :A; b @1 :B; } +``` + +## 16) CI Matrix + +* OS: ubuntu-latest, macos-latest, windows-latest. +* Py: 3.10, 3.11, 3.12, 3.13. +* Steps: build, `capnp compile` samples, unit + typing tests, benchmark smoke. + +## 17) Documentation TODO + +* New page: *Python Generics in pycapnp* with quickstart, codec cookbook, and pitfalls. +* Update README: reference `--python-generics`. + +## 18) Out of Scope / Future Work + +* Auto-derivation of codecs from dataclasses/attrs/pydantic via reflection. +* `AnyPointer` dynamic generic wrapper (type-tagged). +* IDE plugin to generate wrapper codecs from schema automatically. + +## 19) Owner & Contact + +* Tech Lead: **Ellis Breen** (proposer) +* Maintainers to ping: `pycapnp` maintainers; Cap’n Proto community for review. + +--- + +**End of ROBOTS.md** diff --git a/capnp/_generic.py b/capnp/_generic.py new file mode 100644 index 000000000..838d73f87 --- /dev/null +++ b/capnp/_generic.py @@ -0,0 +1,154 @@ +"""Runtime support for generated Python generic wrappers. + +This module exposes a minimal registry that maps Python types to the +corresponding Cap'n Proto pointer types alongside the conversion functions +needed to bridge between them. Generated wrappers for generic schema types +look up codecs through :func:`find_codec` so that user code can seamlessly +work with rich Python types (``uuid.UUID`` instances, integers that overflow +``int64`` when interpreted as unsigned, etc.) without having to manually +interact with the low level pointer-parameter constraints of Cap'n Proto. + +The runtime is intentionally tiny – it keeps just enough state to resolve the +``Codec`` for a ``(schema_module, python_type)`` pair, with support for +walking the Python type's MRO to honour registrations on base classes. This +behaviour mirrors the way ``isinstance`` performs dispatch which keeps the +runtime predictable and easy to reason about for library users. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from types import ModuleType +from typing import Any, Callable, Dict, Type + +__all__ = [ + "Codec", + "CodecRegistrationError", + "CodecLookupError", + "find_codec", + "register_codec", +] + + +CodecEncoder = Callable[[Any], Any] +CodecDecoder = Callable[[Any], Any] + + +@dataclass(frozen=True) +class Codec: + """A pair of conversion functions for a Cap'n Proto pointer parameter. + + Parameters + ---------- + capnp_type: + The Cap'n Proto pointer type that should be used when instantiating the + generic schema. Generated wrappers pass this value directly to the + underlying ``capnp`` module when constructing the concrete schema type. + to_capnp: + A callable that receives a Python value and returns the object that + should be written into the Cap'n Proto message. For pointer types this + is typically a builder returned by ``new_message`` or, for zero-copy + scenarios, a value that can be assigned to the field directly. + from_capnp: + A callable that receives the reader object retrieved from the Cap'n + Proto message and returns the corresponding Python value. + """ + + capnp_type: Any + to_capnp: CodecEncoder + from_capnp: CodecDecoder + + +class CodecRegistrationError(TypeError): + """Raised when an invalid codec registration is attempted.""" + + +class CodecLookupError(LookupError): + """Raised when no codec could be found for the requested type.""" + + +# The registry is keyed by the schema module and then by the Python type that +# the codec can serialise. Using the module's ``__name__`` keeps lookups stable +# even when the same module object is imported under multiple aliases. +_registry: Dict[str, Dict[Type[Any], Codec]] = {} + + +def _module_key(schema_module: ModuleType | str) -> str: + if isinstance(schema_module, ModuleType): + return schema_module.__name__ + if isinstance(schema_module, str): + return schema_module + raise CodecRegistrationError( + "schema_module must be a module or module name, got " + f"{type(schema_module)!r}" + ) + + +def register_codec(schema_module: ModuleType | str, py_type: Type[Any], codec: Codec) -> None: + """Register *codec* for ``py_type`` within ``schema_module``. + + Parameters + ---------- + schema_module: + The module that hosts the generated Cap'n Proto schema (e.g. + ``my_schema.location_capnp``). This can be the module object itself or + its qualified name. + py_type: + The Python type that the codec handles. The codec will be used for the + exact type as well as subclasses by virtue of MRO based lookups. + codec: + The :class:`Codec` instance describing how to convert between Python + values and the Cap'n Proto representation. + """ + + if not isinstance(py_type, type): + raise CodecRegistrationError( + "py_type must be a Python type, got " f"{type(py_type)!r}" + ) + if not isinstance(codec, Codec): + raise CodecRegistrationError( + "codec must be an instance of Codec, got " f"{type(codec)!r}" + ) + + key = _module_key(schema_module) + _registry.setdefault(key, {})[py_type] = codec + + +def find_codec(schema_module: ModuleType | str, py_type: Type[Any] | Any) -> Codec: + """Return the codec registered for ``py_type`` within ``schema_module``. + + The lookup walks ``py_type``'s MRO so that codecs registered for a base + class automatically apply to subclasses. If no codec is registered a + :class:`CodecLookupError` is raised with a helpful message that lists all + known codecs for the schema module. + """ + + if not isinstance(py_type, type): + py_type = type(py_type) + + key = _module_key(schema_module) + module_registry = _registry.get(key) + if not module_registry: + raise CodecLookupError( + f"No codecs registered for schema module '{key}', cannot encode type " + f"{py_type.__module__}.{py_type.__qualname__}. " + "Use register_codec() to provide conversions." + ) + + for cls in py_type.__mro__: + codec = module_registry.get(cls) + if codec is not None: + return codec + + available = ", ".join(sorted(t.__name__ for t in module_registry)) or "" + raise CodecLookupError( + f"No codec registered for type {py_type.__module__}.{py_type.__qualname__} " + f"in schema module '{key}'. Available codecs: {available}" + ) + + +def _clear_registry() -> None: + """Helper used by tests to reset the registry state.""" + + _registry.clear() + diff --git a/test/test_generic_runtime.py b/test/test_generic_runtime.py new file mode 100644 index 000000000..7454ee8fe --- /dev/null +++ b/test/test_generic_runtime.py @@ -0,0 +1,116 @@ +import types + +import pytest + +import importlib.util +import pathlib +import sys + +_PACKAGE_NAME = "capnp" +_ROOT = pathlib.Path(__file__).resolve().parents[1] +_GENERIC_PATH = _ROOT / "capnp" / "_generic.py" + +package = sys.modules.setdefault(_PACKAGE_NAME, types.ModuleType(_PACKAGE_NAME)) +package.__path__ = [str((_ROOT / "capnp").resolve())] + +spec = importlib.util.spec_from_file_location(f"{_PACKAGE_NAME}._generic", _GENERIC_PATH) +module = importlib.util.module_from_spec(spec) +sys.modules.setdefault(f"{_PACKAGE_NAME}._generic", module) +assert spec.loader is not None +spec.loader.exec_module(module) + +Codec = module.Codec +CodecLookupError = module.CodecLookupError +CodecRegistrationError = module.CodecRegistrationError +_clear_registry = module._clear_registry +find_codec = module.find_codec +register_codec = module.register_codec + + +class _Base: + pass + + +class _Child(_Base): + pass + + +def setup_function(function): + _clear_registry() + + +def teardown_function(function): + _clear_registry() + + +def _dummy_codec(tag): + return Codec(capnp_type=tag, to_capnp=lambda value: (tag, value), from_capnp=lambda reader: reader.value) + + +def test_register_and_find_codec_exact_type(): + module = types.ModuleType("example_capnp") + codec = _dummy_codec("id") + + register_codec(module, _Base, codec) + + resolved = find_codec(module, _Base) + assert resolved is codec + + +def test_find_codec_accepts_instances(): + module = types.ModuleType("example_capnp") + codec = _dummy_codec("id") + + register_codec(module, _Base, codec) + + instance = _Base() + assert find_codec(module, instance) is codec + + +def test_mro_lookup_uses_base_class_registration(): + module = types.ModuleType("example_capnp") + codec = _dummy_codec("base") + + register_codec(module, _Base, codec) + + resolved = find_codec(module, _Child) + assert resolved is codec + + +def test_register_codec_validates_inputs(): + module = types.ModuleType("example_capnp") + codec = _dummy_codec("base") + + with pytest.raises(CodecRegistrationError): + register_codec(object(), _Base, codec) + + with pytest.raises(CodecRegistrationError): + register_codec(module, 123, codec) # type: ignore[arg-type] + + with pytest.raises(CodecRegistrationError): + register_codec(module, _Base, object()) # type: ignore[arg-type] + + +def test_find_codec_raises_helpful_error_when_missing(): + module = types.ModuleType("example_capnp") + + with pytest.raises(CodecLookupError) as excinfo: + find_codec(module, _Base) + + message = str(excinfo.value) + assert "example_capnp" in message + assert "_Base" in message + + +def test_find_codec_lists_available_types(): + module = types.ModuleType("example_capnp") + register_codec(module, _Base, _dummy_codec("base")) + + class _Unrelated: # local type to force failure branch + pass + + with pytest.raises(CodecLookupError) as excinfo: + find_codec(module, _Unrelated) + + assert "_Base" in str(excinfo.value) +