PyPersistent

A high-performance collection of persistent (immutable) data structures for Python, written in C++.

Features

Immutable: All operations return new collections, leaving originals unchanged
Structural Sharing: New versions share most structure (O(log n) changes instead of O(n) copies)
Fast: 3-5x faster than pyrsistent, with merge operations up to 176x faster
Thread-Safe: Immutability makes concurrent access safe without locks
Python 3.13+ Free-Threading Ready: Lock-free design with atomic reference counting for true parallelism
Memory Efficient: Structural sharing minimizes memory overhead
Dual Interface: Both functional (Clojure-style) and Pythonic APIs
Pickle Support: Native serialization for all data structures

Data Structures

PyPersistent provides five core persistent data structures:

PersistentDict

Unordered key-value map based on Hash Array Mapped Trie (HAMT).

Use for: General-purpose dictionary needs with immutability
Time complexity: O(log₃₂ n) ≈ 6 steps for 1M elements
Features: Fast lookups, structural sharing, bulk merge operations

Example:

from pypersistent import PersistentDict

m = PersistentDict.create(name='Alice', age=30)
m2 = m.set('city', 'NYC')
m3 = m | {'role': 'developer'}  # Merge

PersistentSortedDict

Sorted key-value map based on Left-Leaning Red-Black Tree.

Use for: Ordered data, range queries, min/max operations
Time complexity: O(log₂ n) for all operations
Features: Sorted iteration, range queries (subseq/rsubseq), first/last

Example:

from pypersistent import PersistentSortedDict

m = PersistentSortedDict.from_dict({3: 'c', 1: 'a', 2: 'b'})
list(m.keys())  # [1, 2, 3] - always sorted

# Range queries [start, end) - start inclusive, end exclusive
sub = m.subseq(1, 3)  # or m[1:3]
list(sub.keys())  # [1, 2]

# Min/max
m.first()  # (1, 'a')
m.last()   # (3, 'c')

PersistentList

Indexed sequence based on bit-partitioned vector trie (RRB-Tree variant).

Use for: Ordered sequences with efficient random access and append
Time complexity: O(log₃₂ n) for get/set, O(1) for append
Features: Fast indexed access, efficient append, slicing

Example:

from pypersistent import PersistentList

v = PersistentList.create(1, 2, 3)
v2 = v.conj(4)  # Append
v3 = v2.assoc(0, 10)  # Update index 0
v[1]  # 2 - indexed access

PersistentSet

Unordered set based on HAMT (same as PersistentDict).

Use for: Unique collection of items, set operations
Time complexity: O(log₃₂ n) for add/remove/contains
Features: Fast membership testing, set operations (union, intersection, difference)

Example:

from pypersistent import PersistentSet

s = PersistentSet.create(1, 2, 3)
s2 = s.conj(4)  # Add
s3 = s.disj(2)  # Remove
2 in s  # True

PersistentArrayMap

Small map optimization using array of key-value pairs.

Use for: Maps with < 8 entries (automatic optimization)
Time complexity: O(n) linear scan, but faster than HAMT for tiny maps
Features: Lower memory overhead, faster for very small maps
Note: Typically used internally; PersistentDict automatically uses this for small maps

Choosing the Right Data Structure

Need	Use	Why
General key-value storage	PersistentDict	Fastest for unordered data
Sorted keys / range queries	PersistentSortedDict	Maintains sort order, supports ranges
Indexed sequence	PersistentList	Fast random access and append
Unique items / set operations	PersistentSet	Membership testing, set algebra
Very small dicts (< 8 items)	PersistentArrayMap	Lower overhead for tiny dicts

Performance

Why Use Persistent Data Structures?

The key advantage of persistent data structures is structural sharing: when you create a modified version, the new and old versions share most of their internal structure rather than duplicating everything.

Example:

# Python dict - must copy everything
d1 = {i: f'val{i}' for i in range(10000)}
d2 = d1.copy()  # Copies all 10,000 entries
d2['new_key'] = 'new_value'

# PersistentDict - shares structure
from pypersistent import PersistentDict
m1 = PersistentDict.from_dict({i: f'val{i}' for i in range(10000)})
m2 = m1.set('new_key', 'new_value')  # Shares 99.99% of structure with m1

When to Use pypersistent

✅ Use pypersistent when:

Creating multiple versions - Undo/redo, snapshots, version history
Concurrent access - Multiple threads safely reading shared data without locks
Functional programming - Immutable data flow, pure functions
Building up collections incrementally - Each addition shares structure with previous version

❌ Use standard Python dict/list when:

You only need one mutable version
Maximum raw construction speed is critical
Extremely memory-constrained environments

Performance vs Python Standard Library

PersistentDict vs dict

Operation	pypersistent	Python dict	Speedup
Structural Sharing (100 variants of 10K dict)	74µs	1.79ms	24x faster 🚀
Structural Sharing (100 variants of 1M dict)	158µs	517ms	3271x faster 🚀
Single lookup	776µs (1K ops)	427µs	1.8x slower
Construction (1M elements)	2.74s	299ms	9x slower

Key insight: Creating multiple versions through structural sharing is 24-3271x faster than copying dicts, with the advantage growing dramatically for larger collections.

PersistentList vs list

Operation	pypersistent	Python list	Speedup
Structural Sharing (100 variants of 10K list)	74µs	1.79ms	24x faster 🚀
Append 100 elements	42µs	2.3µs	18x slower
Random access (100 ops)	13µs	2.7µs	5x slower

Key insight: If you need multiple versions (undo, history), PersistentList is dramatically faster.

Performance vs pyrsistent

pyrsistent is a mature pure-Python library with similar persistent data structures. pypersistent offers better performance through its C++ implementation, but with a smaller feature set.

PersistentDict Performance Comparison

Size	Operation	pypersistent (C++)	pyrsistent (Python)	Speedup
100K	Merge (50K+50K)	563µs	98.9ms	176x faster 🚀
10K	Merge (5K+5K)	155µs	9.12ms	59x faster 🚀
10K	Lookup (1K ops)	157µs	476µs	3.0x faster
10K	Update (100 ops)	64µs	287µs	4.5x faster
10K	Construction	811µs	1.95ms	2.4x faster
10K	Iteration	1.27ms	732µs	1.7x slower

Why the difference? pypersistent's C++ implementation provides:

Optimized tree traversal and node allocation
Cache-friendly memory layout
Specialized bulk merge algorithms
Direct memory manipulation without Python object overhead

Feature Comparison

Feature	pypersistent	pyrsistent
PersistentDict (pmap)	✅	✅
PersistentList (pvector)	✅	✅
PersistentSet (pset)	✅	✅
PersistentSortedDict	✅	✅ (pbag)
PersistentArrayMap (small dicts)	✅	❌
PersistentBag (multiset)	❌	✅
PersistentDeque	❌	✅
PersistentRecord/Class	❌	✅
Evolvers (transient mutations)	❌	✅
Freeze/thaw (deep conversion)	❌	✅
Type checking integration	❌	✅

Summary: Choose pypersistent for performance-critical applications with the core data structures. Choose pyrsistent for a richer feature set and pure-Python portability.

Performance by Map Size

Size	Construction	Lookup (1K ops)	Sharing (100 variants)
100	73µs	19µs	77µs
1K	765µs	205µs	95µs
10K	9.7ms	228µs	108µs
100K	110ms	255µs	133µs
1M	2.74s	776ms	158µs

Fast Iteration Methods

For complete iteration over small-medium dicts, use materialized list methods:

m = PersistentDict.from_dict({...})

# Fast methods (1.7-3x faster for dicts < 100K)
items = m.items_list()   # Returns list of (key, value) tuples
keys = m.keys_list()     # Returns list of keys
values = m.values_list() # Returns list of values

# Lazy iterators (better for very large dicts or early exit)
for k, v in m.items():   # Generator, O(log n) memory
    ...

Performance:

Dicts ≤ 10K: 3x faster with items_list()
Dicts ≤ 100K: 1.7x faster with items_list()
Dicts > 100K: Use iterator (lazy, memory-efficient)

Technical Details

Implementation: C++ HAMT (Hash Array Mapped Trie) with:

Bottom-up bulk construction for from_dict()
Arena allocator for fast node allocation
Structural tree merging for merge()
COW semantics for collision nodes
Fast iteration with pre-allocated lists

Time Complexity: O(log₃₂ n) ≈ 6 steps for 1M elements Space Complexity: O(n) with structural sharing across versions

For detailed performance analysis and benchmarking methodology, see docs/.

Installation

pip install pypersistent

Or build from source:

git clone https://github.com/cmarschner/pypersistent.git
cd pypersistent
python setup.py install

Use Cases

Use persistent data structures when:

Creating multiple versions of data (undo/redo, time-travel, version history)
Sharing data across threads without locks or defensive copying
Functional programming patterns (immutability, pure functions)
Creating modified copies is frequent (structural sharing makes this fast)
Python 3.13+ free-threading for true parallelism

Use mutable collections when:

Single mutable instance is sufficient
Maximum raw construction speed is critical
Memory per entry is highly constrained

Specific use cases:

Config management: Share base config, each thread creates customized version
Event sourcing: Maintain history of all states efficiently
Reactive programming: Pass immutable state between components
Concurrent caching: Multiple threads read/update cache without locks
Functional data pipelines: Transform data through pipeline stages

How It Works

PyPersistent implements multiple classic persistent data structures:

PersistentDict - Hash Array Mapped Trie (HAMT)

Based on the HAMT data structure used by Clojure, Scala, and Haskell:

32-way branching tree indexed by hash bits
Path copying for immutability
Structural sharing between versions
std::shared_ptr for entry sharing (44x fewer INCREF/DECREF)
Inline storage with std::variant for cache-friendly access

PersistentSortedDict - Left-Leaning Red-Black Tree

Self-balancing binary search tree with:

Path copying for immutability
Sorted order maintenance (O(log₂ n))
Atomic reference counting for node sharing
Range query support via tree traversal

PersistentList - Bit-Partitioned Trie

32-way branching tree for indexed access:

Path copying for updates
Tail optimization for fast append
O(log₃₂ n) random access (~6 steps for 1M elements)
Efficient slicing via structural sharing

PersistentSet - HAMT-based Set

Uses PersistentDict internally with:

Keys as set elements
Same O(log₃₂ n) complexity
Set algebra operations

PersistentArrayMap - Simple Array

Linear array for tiny dicts (< 8 entries):

Lower overhead than HAMT for small sizes
O(n) operations but faster than tree for n < 8
Automatically used by PersistentDict when beneficial

Technical Details

Complexity:

PersistentDict/HashSet: O(log₃₂ n) ≈ 6 steps for 1M elements
PersistentSortedDict: O(log₂ n) for all operations
PersistentList: O(log₃₂ n) get/set, O(1) append
PersistentArrayMap: O(n) but fast for n < 8

Implementation:

C++ with pybind11 bindings
Atomic reference counting (std::atomic<int>)
Structural sharing for memory efficiency
Thread-safe reads (fully immutable)

Python 3.13+ Free-Threading Support

PyPersistent is fully compatible with Python 3.13's experimental free-threading mode (nogil), making it ideal for parallel workloads:

Why It Works

Lock-Free Reads: Immutable data structure allows concurrent reads without synchronization
Atomic Reference Counting: Internal C++ reference counting uses std::atomic operations
Thread-Safe Entry Storage: Uses std::shared_ptr with built-in thread-safe reference counting
Independent Updates: Each thread can create new versions without blocking others

Usage with Free-Threading

# Python 3.13+ with --disable-gil or PYTHON_GIL=0
import threading
from pypersistent import PersistentDict

# Shared base dict
base_config = PersistentDict.create(
    api_url='https://api.example.com',
    timeout=30,
    retries=3
)

def process_request(thread_id, data):
    # Each thread creates its own version - no locks needed!
    my_config = base_config.set('thread_id', thread_id)
    my_config = my_config.set('request_data', data)

    # Concurrent reads are completely lock-free
    url = my_config['api_url']
    timeout = my_config['timeout']

    # Do work...
    return my_config

# Spawn threads - true parallelism without GIL!
threads = []
for i in range(100):
    t = threading.Thread(target=process_request, args=(i, f'data_{i}'))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

Performance Benefits

Without the GIL:

Parallel reads: Multiple threads read simultaneously without contention
Parallel updates: Each thread creates new versions independently
No lock overhead: Zero synchronization cost for immutable operations
Structural sharing shines: Creating 100 thread-local variants is 3000x faster than copying dicts

Enable Free-Threading

# Python 3.13+
python3.13 --disable-gil your_script.py

# Or set environment variable
export PYTHON_GIL=0
python3.13 your_script.py

Note: Free-threading is experimental in Python 3.13. Some packages may not be compatible yet.

Development

# Install development dependencies
pip install pytest pybind11

# Build extension
python setup.py build_ext --inplace

# Run all tests
pytest -v

# Run specific data structure tests
pytest test_persistent_dict.py -v
pytest test_persistent_sorted_dict.py -v
pytest test_persistent_list.py -v
pytest test_persistent_set.py -v

# Run performance benchmarks
python performance_test.py
python performance_vector.py

License

MIT License - see LICENSE file for details

Credits

Inspired by Clojure's persistent data structures and the HAMT paper by Bagwell (2001).

Implementation by Clemens Marschner.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PLATFORMS.md		PLATFORMS.md
PR_SUMMARY.md		PR_SUMMARY.md
PUBLISHING.md		PUBLISHING.md
README.md		README.md
README_CPP.md		README_CPP.md
pypersistent_demo.ipynb		pypersistent_demo.ipynb
pyproject.toml		pyproject.toml
pyrsistent_comparison_current.txt		pyrsistent_comparison_current.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

PyPersistent

Features

Data Structures

PersistentDict

PersistentSortedDict

PersistentList

PersistentSet

PersistentArrayMap

Choosing the Right Data Structure

Performance

Why Use Persistent Data Structures?

When to Use pypersistent

Performance vs Python Standard Library

PersistentDict vs dict

PersistentList vs list

Performance vs pyrsistent

PersistentDict Performance Comparison

Feature Comparison

Performance by Map Size

Fast Iteration Methods

Technical Details

Installation

Use Cases

How It Works

PersistentDict - Hash Array Mapped Trie (HAMT)

PersistentSortedDict - Left-Leaning Red-Black Tree

PersistentList - Bit-Partitioned Trie

PersistentSet - HAMT-based Set

PersistentArrayMap - Simple Array

Technical Details

Python 3.13+ Free-Threading Support

Why It Works

Usage with Free-Threading

Performance Benefits

Enable Free-Threading

Development

License

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages