-
Notifications
You must be signed in to change notification settings - Fork 93
Open
Description
Overview
Add snapshot, export, import, and clone functionality to BoxLite.
PR #217 by @joeyaflores provides a solid foundation (disk filename constants, auto-migration,
resolve_stopped_boxhelper). However the API architecture needs rework to match the design below — see Design Deviations from PR #217 at the bottom.
API Design (Final)
┌──────────────────────────────────────────────────────────────────────┐
│ FINAL API (8 methods) │
├──────────────────────────────────────────────────────────────────────┤
│ LiteBox │ BoxliteRuntime │
│ ─────────────────────────────────────│──────────────────────────────│
│ box.snapshot().create(name, opts) │ runtime.import(archive, name) │
│ box.snapshot().list() │ │
│ box.snapshot().get(name) │ │
│ box.snapshot().remove(name) │ │
│ box.snapshot().restore(name) │ │
│ box.export(dest, opts) │ │
│ box.clone(name, opts) │ │
└──────────────────────────────────────────────────────────────────────┘
Ownership principle:
- Has source box handle → box method (
clone,export,snapshot.*) - No source box → runtime method (
importcreates from external archive)
Why sub-resource snapshot()?
- Resolves @uran0sH's naming confusion (
snapshotverb vssnapshotsnoun) - Clear CRUD semantics: create/list/get/remove/restore
- Zero-cost in Rust (
SnapshotHandleis just a&LiteBoxreference)
Why clone on box, not runtime?
- Box already has disk paths, config — no need to pass source name string
- Prevents typos:
rt.clone("tset", ...)impossible withbox.clone(...) - Fan-out is natural:
[box.clone(f"w-{i}", opts) for i in range(10)]
Rust API
impl LiteBox {
pub fn snapshot(&self) -> SnapshotHandle<'_>;
pub async fn export(&self, dest: &Path, opts: ExportOptions) -> BoxliteResult<PathBuf>;
pub async fn clone(&self, name: &str, opts: CloneOptions) -> BoxliteResult<LiteBox>;
}
pub struct SnapshotHandle<'a> { litebox: &'a LiteBox }
impl<'a> SnapshotHandle<'a> {
pub async fn create(&self, name: &str, opts: SnapshotOptions) -> BoxliteResult<SnapshotInfo>;
pub async fn list(&self) -> BoxliteResult<Vec<SnapshotInfo>>;
pub async fn get(&self, name: &str) -> BoxliteResult<Option<SnapshotInfo>>;
pub async fn remove(&self, name: &str) -> BoxliteResult<()>;
pub async fn restore(&self, name: &str) -> BoxliteResult<()>;
}
impl BoxliteRuntime {
pub async fn import(&self, archive: &Path, name: &str) -> BoxliteResult<LiteBox>;
}Options (RocksDB-inspired)
All options: Default trait, builder methods returning &mut Self.
pub struct SnapshotOptions {
quiesce: bool, // Default: true (FIFREEZE before snapshot)
quiesce_timeout_secs: u64, // Default: 30
stop_on_quiesce_fail: bool, // Default: true
}
pub struct ExportOptions {
compress: bool, // Default: true (zstd)
compression_level: i32, // Default: 3 (zstd 1-22)
include_metadata: bool, // Default: true
}
pub struct CloneOptions {
cow: bool, // Default: true (QCOW2 COW, ~1ms per clone)
start_after_clone: bool, // Default: false
from_snapshot: Option<String>, // Default: None (clone from specific snapshot)
}Why from_snapshot? (addresses @IANTHEREAL Q1)
- Fan-out 10 workers from same snapshot:
box.clone("w-1", CloneOptions::default().from_snapshot("v1")) - Without this: restore → clone → restore back (3 ops instead of 1)
- 10 parallel COW clones ~3ms wall time
Types
pub struct SnapshotInfo {
pub name: String,
pub created_at: DateTime<Utc>,
pub size_bytes: u64, // Total (both disks)
pub guest_disk_bytes: u64,
pub container_disk_bytes: u64,
}
pub struct ArchiveManifest {
pub version: u32,
pub box_name: Option<String>,
pub image: String,
pub guest_disk_checksum: String, // "sha256:..."
pub container_disk_checksum: String,
pub exported_at: DateTime<Utc>,
}Key Design Decisions
- Two disks captured — Both
guest-rootfs.qcow2anddisk.qcow2(entire filesystem, @IANTHEREAL Q2) - Disk-only state — No memory/CPU state, clean boot on restore
- External COW snapshots — Separate files per snapshot in
snapshots/{name}/, not QCOW2 internal snapshots - COW clone by default — Uses existing
create_cow_child_disk(), noqemu-imgdependency removenotdelete— Consistent with existingruntime.remove()- Naming reviewed against BoxLite conventions, RocksDB, containerd, libgit2, PostgreSQL, Kubernetes
Disk Layout
~/.boxlite/boxes/{box_id}/
├── guest-rootfs.qcow2 # Guest VM disk
├── disk.qcow2 # Container disk
└── snapshots/{name}/
├── guest-rootfs.qcow2 # QCOW2 COW snapshot
└── disk.qcow2 # QCOW2 COW snapshot
Archive (.boxsnap):
archive.boxsnap (tar.zst)
├── manifest.json
├── guest-rootfs.qcow2 # Flattened (standalone)
└── disk.qcow2 # Flattened (standalone)
Database Schema
CREATE TABLE IF NOT EXISTS box_snapshot (
id TEXT PRIMARY KEY NOT NULL,
box_id TEXT NOT NULL,
name TEXT NOT NULL,
created_at INTEGER NOT NULL,
snapshot_dir TEXT NOT NULL,
guest_disk_size_bytes INTEGER NOT NULL,
container_disk_size_bytes INTEGER NOT NULL,
size_bytes INTEGER NOT NULL DEFAULT 0,
FOREIGN KEY (box_id) REFERENCES box_config(id) ON DELETE CASCADE,
UNIQUE(box_id, name)
);Thread Safety
pub enum BoxStatus {
Stopped, Running, Snapshotting, Restoring, Exporting,
}| Operation | Allowed From | Blocks Others |
|---|---|---|
snapshot().create() |
Stopped, Running | Yes |
snapshot().list/get() |
Any | No |
snapshot().restore() |
Stopped only | Yes |
snapshot().remove() |
Stopped only | Yes |
export() |
Stopped, Running | Yes |
Python Usage
from boxlite import Boxlite, BoxOptions, SnapshotOptions, ExportOptions, CloneOptions
import asyncio
async def main():
async with Boxlite.default() as rt:
box = await rt.create(BoxOptions(image='alpine'), name='test')
# ── Snapshot CRUD ──
await box.snapshot.create("v1")
await box.snapshot.create("v2", SnapshotOptions(quiesce_timeout_secs=60))
snaps = await box.snapshot.list()
info = await box.snapshot.get("v1")
await box.snapshot.restore("v1")
await box.snapshot.remove("v2")
# ── Export / Import ──
archive = await box.export("/tmp/backup.boxsnap")
new_box = await rt.import_box("/tmp/backup.boxsnap", "restored")
# ── Clone (current state) ──
clone = await box.clone("my-clone")
# ── Fan-out from snapshot (sub-second!) ──
opts = CloneOptions(from_snapshot="v1")
workers = await asyncio.gather(*[
box.clone(f"worker-{i}", opts) for i in range(10)
])Implementation Steps
| Priority | Module | Files |
|---|---|---|
| 1 | Types, Options, Handle | boxlite/src/snapshot/{mod,handle,options,types}.rs |
| 2 | Database & Disk | boxlite/src/db/{schema,snapshots}.rs, disk/qcow2.rs |
| 3 | LiteBox & Runtime | litebox/box_impl.rs, runtime/core.rs |
| 4 | Guest Quiesce | guest/src/quiesce.rs (FIFREEZE/FITHAW) |
| 5 | Python SDK | sdks/python/src/{snapshot,box_handle,runtime,options}.rs |
Reviewer Questions Addressed
| Question | Answer |
|---|---|
| @shayne-snap: Running box? | Quiesce (FIFREEZE), falls back to stop |
| @shayne-snap: After restore? | Box stays stopped, user calls start() |
| @shayne-snap: Delete cascade? | Yes, ON DELETE CASCADE |
| @IANTHEREAL: Clone from snapshot? | CloneOptions(from_snapshot="v1") |
| @IANTHEREAL: Snapshot boundary? | Entire filesystem (both disks) |
| @IANTHEREAL: Independent lifecycle? | Future enhancement (CASCADE for now) |
| @uran0sH: Naming confusion? | Sub-resource: box.snapshot().create/list/get/remove/restore |
Design Deviations from PR #217
PR #217 by @joeyaflores is a great first implementation. Here's what to adopt and what needs rework:
✅ Adopt from PR #217
- Disk filename constants (
disk/constants.rs) - Auto-migration support (v4→v5)
resolve_stopped_box()helperguest_rootfs_disk_path()on layout- DB test patterns
🔧 Needs Rework
| PR #217 | Target Design | Why |
|---|---|---|
All methods on BoxliteRuntime |
Methods on LiteBox + sub-resource |
Box handle already has context; avoids string-based lookup |
Flat API (snapshot(), list_snapshots()) |
Sub-resource box.snapshot().create/list/... |
Resolves naming confusion, clean CRUD |
| No Options types | SnapshotOptions, ExportOptions, CloneOptions |
Extensible without breaking changes |
Full copy only (qemu-img convert) |
COW clone default (create_cow_child_disk()) |
1000x perf: ~1ms vs ~seconds per clone |
External qemu-img dependency |
In-process QCOW2 operations | No new dependencies |
| QCOW2 internal snapshots | External COW files in snapshots/{name}/ |
Better for clone-from-snapshot, size tracking |
duplicate() |
clone() |
Industry standard; LiteBox doesn't impl Rust Clone |
delete_snapshot() |
remove() |
Consistent with runtime.remove() |
SnapshotRecord |
SnapshotInfo |
Consistent with BoxInfo pattern |
snapshots table |
box_snapshot table |
Consistent with box_config, box_state |
.boxlite extension |
.boxsnap extension |
Avoids conflict with project name |
| No compression | tar.zst with configurable level | Smaller archives |
| No checksums | SHA-256 checksums in manifest | Integrity verification |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels