feat(core): Memory-Mapped Model Loading

## Overview

Implement memory-mapped file loading for large SafeTensors models.

## Motivation

Large models (70B+) exceed VRAM capacity. Memory mapping enables:
- Streaming weights from disk to GPU
- Layer-by-layer loading without full RAM usage
- Fast model switching without full reload

## Features

- [ ] mmap-based SafeTensors reader
- [ ] Lazy tensor loading (load on first access)
- [ ] Pinned memory staging for faster H2D
- [ ] Layer eviction for multi-model scenarios
- [ ] Windows/Linux support

## Design

```python
from pygpukit.llm import LazyModel

# Model weights stay on disk until accessed
model = LazyModel.from_safetensors("path/to/model", lazy=True)

# Only loads embedding layer
embeddings = model.embed_tokens(input_ids)

# Loads layer 0 on demand
hidden = model.blocks[0](embeddings)
```

## Related

- SafeTensors mmap support
- HuggingFace Accelerate disk offload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Memory-Mapped Model Loading #159

Overview

Motivation

Features

Design

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(core): Memory-Mapped Model Loading #159

Description

Overview

Motivation

Features

Design

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions