Skip to content

Add msgpack as alternative serialization format for entity storage #2

@deucalioncodes

Description

@deucalioncodes

Problem

ic-python-db currently uses json.dumps/json.loads for all entity serialization in db_engine.py. While CPython 3.13 on WASI likely has the C-accelerated _json module, there is still overhead from:

  1. JSON text format is larger than binary alternatives (string quoting, key repetition)
  2. Every db.save() and db.load() goes through JSON encode/decode
  3. On IC canisters, larger payloads mean more StableBTreeMap storage and more cycles

Proposal

Add msgpack as an alternative (or replacement) serialization format in db_engine.py.

Benefits

  • Smaller payloads — msgpack is typically 30-50% smaller than JSON for the same data
  • Faster encoding/decoding — even pure-Python msgpack is competitive with C-JSON for small dicts; a C-accelerated msgpack would be significantly faster
  • Binary-native — no string escaping overhead for keys/values

Implementation approach

The change is localized to db_engine.py — swap json.dumps/json.loads with msgpack equivalents:

# db_engine.py
import msgpack  # or fallback to json

def save(self, type_name, id, data):
    key = f"{type_name}@{id}"
    self._db_storage.insert(key, msgpack.packb(data))
    
def load(self, type_name, id):
    key = f"{type_name}@{id}"
    data = self._db_storage.get(key)
    if data:
        return msgpack.unpackb(data)
    return None

Considerations

  • Storage interface change: Storage.insert/get currently use str values. With msgpack the values would be bytes. Need to update Storage ABC and MemoryStorage.
  • Migration: Existing canister data is JSON-encoded. Need a migration path or dual-format support (try msgpack first, fall back to JSON).
  • WASI C extension: For maximum benefit, the msgpack C extension would need to be cross-compiled for wasm32-wasip1 and linked into the CPython WASM binary. Pure-Python msgpack may not outperform C-accelerated JSON.
  • Audit log: Also uses json.dumps — should be updated consistently.
  • Depends on: Issue Fix redundant save on every Entity.load() — eliminates unnecessary json.dumps + storage write #1 (redundant save on load) should be fixed first to reduce the number of serialization calls before optimizing the serialization itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions