Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions ROBOTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# ROBOTS.md — Python Generics for Cap’n Proto (pycapnp enhancement)

## 1) Mission

Implement **first-class Python generics for Cap’n Proto schema generics** in `pycapnp` via a new codegen path and a tiny runtime. Developers should be able to use:

```py
from my_schema.location_generic import Location
loc = Location[str]("NYC", 40.71, -74.0)
loc2 = Location[int](2**63, 51.5, -0.12)
```

…with accurate type hints, minimal overhead, and **no schema changes**.

## 2) Goals (Must)

* Generate **`typing.Generic[...]` classes** for Cap’n Proto generic structs/interfaces.
* Hide the **pointer-parameter constraint** behind a **codec registry** (auto wrap/unbox scalars and custom types).
* Ship **PEP 561 stubs** so mypy/pyright infer `Location[T].id: T`.
* Preserve wire compatibility and the existing raw API.
* Keep overhead negligible vs current `pycapnp` (<2% in simple construct+serialize microbenchmarks).

## 3) Non-goals (Won’t for v1)

* No HKTs (higher-kinded types).
* No cross-module generic unification beyond explicit codec registration.
* No changes to Cap’n Proto core or schema language.

## 4) Outputs / Deliverables

1. **Runtime module**: `capnp/_generic.py` (Codec, registry, helpers).
2. **Codegen templates** (Jinja or Python format strings):

* `*_generic.py` wrapper per generic type.
* `*_generic.pyi` stubs per generic type.
* `*_codecs.py` per schema module (default codec registrations when wrappers exist).
3. **Compiler flag**: `--python-generics={off|stubs|full}`; default `off`.
4. **Docs**: new “Python generics” page + migration notes.
5. **Tests**: unit, typing, and microbenchmarks.

## 5) High-level Design

### 5.1 Runtime (`capnp/_generic.py`)

* `Codec(capnp_type, to_capnp, from_capnp)`.
* `_registry[(schema_module, py_type)] -> Codec` with MRO fallback.
* API: `register_codec(schema_module, py_type, codec)` and `find_codec(schema_module, py_type)`.

### 5.2 Generated Wrapper (per generic type)

Given schema `Foo(P1, P2, ...)`, emit `foo_generic.py` exposing:

```py
P1 = TypeVar("P1"); P2 = TypeVar("P2")
class Foo(Generic[P1, P2]):
def __init__(self, a1: P1, a2: P2, ...):
codec1 = find_codec(_mod, type(a1))
codec2 = find_codec(_mod, type(a2))
impl = _mod.Foo[codec1.capnp_type, codec2.capnp_type]
# construct message, set fields via codecs
```

* `from_bytes(cls, data: bytes, py_types: tuple[type,...] | Type[P1]...)` loads the **concrete** instantiation and returns a typed wrapper.
* Properties unwrap via `from_capnp`.

### 5.3 Type Stubs (PEP 561)

* Emit `.pyi` alongside wrappers with `Generic[...]` and precise field types.
* Provide overloads for common `T` (str/int/bytes/uuid.UUID) to improve inference.

### 5.4 Default Codecs

Emit module-level registrations in `*_codecs.py` (imported by `*_generic.py`):

| Python type | Cap’n pointer type | Wrapper required | to_capnp | from_capnp |
| ----------- | --------------------- | ---------------- | ------------------- | --------------------- |
| `str` | `Text` | No | identity | `str(r)` |
| `bytes` | `Data` via `BytesRef` | Yes | new wrapper message | `bytes(r.value)` |
| `int` | `Number64` | Yes | mask → wrapper | `int(r.value)` |
| `uuid.UUID` | `Uuid128` | Yes | split → wrapper | merge → `UUID(int=…)` |

> Emit only when the schema module defines the necessary wrapper structs.

## 6) Codegen Integration

* Extend the Python backend of `capnp compile -opython` to:

1. Parse the AST: detect generic types and their parameters.
2. Emit raw module as today (e.g., `location_capnp.py`).
3. If `--python-generics=stubs|full`, render `*_generic.(py|pyi)`.
4. If `--python-generics=full`, render `*_codecs.py` and autoload it from the wrapper.
* Naming: place wrappers next to raw module; suffix `_generic` to avoid import cycles.

## 7) Backward Compatibility

* No behavioral change when flag is `off` (default).
* Existing imports continue to work (`import location_capnp as capnp`).
* New API is additive: `from my_schema.location_generic import Location`.

## 8) CLI Flags & Behavior

* `--python-generics=off|stubs|full`.
* `--python-generics-scalar=wrap|error` (controls default codec emission for scalars).
* Future: `--python-generics-anypointer` to generate `AnyPointer`-based dynamic wrappers (out-of-scope for v1).

## 9) Testing Plan

### 9.1 Unit

* Round-trip for `str`, big `int (>= 2**63)`, `bytes`, `uuid.UUID`.
* Multi-parameter generic: `Pair(A,B)`.
* Missing codec → `TypeError` with helpful message.
* Wrapper-free path (Text/Data) remains zero-copy.

### 9.2 Typing

* `mypy` + `pyright` tests: `reveal_type(Location[str](...).id) -> builtins.str` etc.
* Overload resolution for `from_bytes(blob, str)`.

### 9.3 Benchmarks (pytest-benchmark)

* Construct + set + `to_bytes_packed` for raw vs generic wrapper.
* Target: ≤2% overhead median.

## 10) Repository Layout Changes

```
pycapnp/
_generic.py # NEW runtime
codegen/
templates/
generic_wrapper.py.j2
generic_wrapper.pyi.j2
module_codecs.py.j2
... existing files ...
```

## 11) Acceptance Criteria

* ✅ Flagged build emits wrappers/stubs for generic schemas.
* ✅ `pip install` publishes `py.typed` and stubs.
* ✅ Typing tests pass on CPython 3.10–3.13.
* ✅ Benchmarks meet perf budget.
* ✅ Docs page with end-to-end example and codec recipe.

## 12) Rollout Plan

1. Land runtime + codegen behind `--python-generics=stubs`.
2. Add default codecs + `full` mode.
3. Publish pre-release (`0.x`) to gather feedback.
4. Stabilize and document integration patterns (dataclasses/Pydantic codecs).

## 13) Risks & Mitigations

* **Codec explosion**: encourage per-module codecs; document conventions.
* **Unsigned semantics**: recommend `bigint` for JS/TS consumers; in Python, `int` is unbounded.
* **User surprise on Any**: be explicit that *this is not AnyPointer*; codecs maintain type safety.

## 14) Developer Guide (How to)

1. Build local: `pip install -e .` (dev) and ensure `capnp` compiler on PATH.
2. Compile schema: `capnp compile -opython --python-generics=full schema/location.capnp`.
3. Use in app:

```py
from my_schema.location_generic import Location
a = Location[str]("NYC", 40.71, -74.0)
b = Location[int](2**63, 51.5, -0.12)
```
4. Register custom type:

```py
from my_schema._codecs import register_codec, Codec
from my_schema import location_capnp as mod
class OrderId: ...
register_codec(mod, OrderId, Codec(mod.Number64,
lambda o: mod.Number64.new_message(value=o.n),
lambda r: OrderId(int(r.value))))
```

## 15) Example Schema for CI tests

```capnp
@0x1a2b3c4d1a2b3c4d;
struct Location(Id) { id @0 :Id; lat @1 :Float64; lon @2 :Float64; }
struct Number64 { value @0 :UInt64; }
struct BytesRef { value @0 :Data; }
struct Uuid128 { hi @0 :UInt64; lo @1 :UInt64; }
struct Pair(A, B) { a @0 :A; b @1 :B; }
```

## 16) CI Matrix

* OS: ubuntu-latest, macos-latest, windows-latest.
* Py: 3.10, 3.11, 3.12, 3.13.
* Steps: build, `capnp compile` samples, unit + typing tests, benchmark smoke.

## 17) Documentation TODO

* New page: *Python Generics in pycapnp* with quickstart, codec cookbook, and pitfalls.
* Update README: reference `--python-generics`.

## 18) Out of Scope / Future Work

* Auto-derivation of codecs from dataclasses/attrs/pydantic via reflection.
* `AnyPointer` dynamic generic wrapper (type-tagged).
* IDE plugin to generate wrapper codecs from schema automatically.

## 19) Owner & Contact

* Tech Lead: **Ellis Breen** (proposer)
* Maintainers to ping: `pycapnp` maintainers; Cap’n Proto community for review.

---

**End of ROBOTS.md**
154 changes: 154 additions & 0 deletions capnp/_generic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
"""Runtime support for generated Python generic wrappers.

This module exposes a minimal registry that maps Python types to the
corresponding Cap'n Proto pointer types alongside the conversion functions
needed to bridge between them. Generated wrappers for generic schema types
look up codecs through :func:`find_codec` so that user code can seamlessly
work with rich Python types (``uuid.UUID`` instances, integers that overflow
``int64`` when interpreted as unsigned, etc.) without having to manually
interact with the low level pointer-parameter constraints of Cap'n Proto.

The runtime is intentionally tiny – it keeps just enough state to resolve the
``Codec`` for a ``(schema_module, python_type)`` pair, with support for
walking the Python type's MRO to honour registrations on base classes. This
behaviour mirrors the way ``isinstance`` performs dispatch which keeps the
runtime predictable and easy to reason about for library users.
"""

from __future__ import annotations

from dataclasses import dataclass
from types import ModuleType
from typing import Any, Callable, Dict, Type

__all__ = [
"Codec",
"CodecRegistrationError",
"CodecLookupError",
"find_codec",
"register_codec",
]


CodecEncoder = Callable[[Any], Any]
CodecDecoder = Callable[[Any], Any]


@dataclass(frozen=True)
class Codec:
"""A pair of conversion functions for a Cap'n Proto pointer parameter.

Parameters
----------
capnp_type:
The Cap'n Proto pointer type that should be used when instantiating the
generic schema. Generated wrappers pass this value directly to the
underlying ``capnp`` module when constructing the concrete schema type.
to_capnp:
A callable that receives a Python value and returns the object that
should be written into the Cap'n Proto message. For pointer types this
is typically a builder returned by ``new_message`` or, for zero-copy
scenarios, a value that can be assigned to the field directly.
from_capnp:
A callable that receives the reader object retrieved from the Cap'n
Proto message and returns the corresponding Python value.
"""

capnp_type: Any
to_capnp: CodecEncoder
from_capnp: CodecDecoder


class CodecRegistrationError(TypeError):
"""Raised when an invalid codec registration is attempted."""


class CodecLookupError(LookupError):
"""Raised when no codec could be found for the requested type."""


# The registry is keyed by the schema module and then by the Python type that
# the codec can serialise. Using the module's ``__name__`` keeps lookups stable
# even when the same module object is imported under multiple aliases.
_registry: Dict[str, Dict[Type[Any], Codec]] = {}


def _module_key(schema_module: ModuleType | str) -> str:
if isinstance(schema_module, ModuleType):
return schema_module.__name__
if isinstance(schema_module, str):
return schema_module
raise CodecRegistrationError(
"schema_module must be a module or module name, got "
f"{type(schema_module)!r}"
)


def register_codec(schema_module: ModuleType | str, py_type: Type[Any], codec: Codec) -> None:
"""Register *codec* for ``py_type`` within ``schema_module``.

Parameters
----------
schema_module:
The module that hosts the generated Cap'n Proto schema (e.g.
``my_schema.location_capnp``). This can be the module object itself or
its qualified name.
py_type:
The Python type that the codec handles. The codec will be used for the
exact type as well as subclasses by virtue of MRO based lookups.
codec:
The :class:`Codec` instance describing how to convert between Python
values and the Cap'n Proto representation.
"""

if not isinstance(py_type, type):
raise CodecRegistrationError(
"py_type must be a Python type, got " f"{type(py_type)!r}"
)
if not isinstance(codec, Codec):
raise CodecRegistrationError(
"codec must be an instance of Codec, got " f"{type(codec)!r}"
)

key = _module_key(schema_module)
_registry.setdefault(key, {})[py_type] = codec


def find_codec(schema_module: ModuleType | str, py_type: Type[Any] | Any) -> Codec:
"""Return the codec registered for ``py_type`` within ``schema_module``.

The lookup walks ``py_type``'s MRO so that codecs registered for a base
class automatically apply to subclasses. If no codec is registered a
:class:`CodecLookupError` is raised with a helpful message that lists all
known codecs for the schema module.
"""

if not isinstance(py_type, type):
py_type = type(py_type)

key = _module_key(schema_module)
module_registry = _registry.get(key)
if not module_registry:
raise CodecLookupError(
f"No codecs registered for schema module '{key}', cannot encode type "
f"{py_type.__module__}.{py_type.__qualname__}. "
"Use register_codec() to provide conversions."
)

for cls in py_type.__mro__:
codec = module_registry.get(cls)
if codec is not None:
return codec

available = ", ".join(sorted(t.__name__ for t in module_registry)) or "<none>"
raise CodecLookupError(
f"No codec registered for type {py_type.__module__}.{py_type.__qualname__} "
f"in schema module '{key}'. Available codecs: {available}"
)


def _clear_registry() -> None:
"""Helper used by tests to reset the registry state."""

_registry.clear()

Loading