diff --git a/CLAUDE.md b/CLAUDE.md index 3a1ae8a1..8f46753b 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -20,7 +20,7 @@ HYTOPIA is a multiplayer voxel game engine monorepo. The **server** (TypeScript/ - **Physics**: Rapier3D (`@dimforge/rapier3d-simd-compat`) at 60 Hz, default gravity `y = -32` - **Networking**: WebTransport (QUIC) preferred, WebSocket fallback. Packets serialized with msgpackr, large payloads gzip-compressed - **Protocol**: `protocol/` defines all packet schemas (AJV-validated). Published as `@hytopia.com/server-protocol` -- **Rendering**: Three.js `WebGLRenderer` + `MeshBasicMaterial` (no dynamic lights). Post-processing: SMAA, bloom, outline. Chunk meshes built in Web Worker via greedy meshing + AO +- **Rendering**: Three.js `WebGLRenderer` + `MeshBasicMaterial` (no dynamic lights). Post-processing: SMAA, bloom, outline. Chunk meshes built in a Web Worker with face culling + AO (no greedy quad merging on `master`) - **Persistence**: `@hytopia.com/save-states` for player/global KV data - **Singleton pattern**: Most server systems use `ClassName.instance`; client systems owned by `Game` singleton diff --git a/CODEBASE_DOCUMENTATION.md b/CODEBASE_DOCUMENTATION.md index d19d5bf8..0c9d7f41 100644 --- a/CODEBASE_DOCUMENTATION.md +++ b/CODEBASE_DOCUMENTATION.md @@ -185,7 +185,7 @@ blocks/BlockTextureAtlasManager.ts - Texture atlas generation blocks/utils.ts - Block utilities chunks/Chunk.ts - Client chunk state chunks/ChunkManager.ts - Chunk lifecycle (load/unload by distance) -chunks/ChunkMeshManager.ts - Greedy meshing + AO for voxel geometry +chunks/ChunkMeshManager.ts - Batch meshes from worker output (per-face meshing with face culling + AO on `master`) chunks/ChunkRegistry.ts - Chunk lookup chunks/ChunkConstants.ts - Chunk size constants chunks/ChunkStats.ts - Chunk performance stats @@ -423,4 +423,4 @@ zombies-fps/ - Zombie FPS - **Dual transport** — WebTransport (QUIC) preferred, WebSocket fallback. Reliable stream + unreliable datagrams - **msgpackr serialization** — All packets serialized with msgpackr, large payloads gzip-compressed - **60 Hz physics / 30 Hz network** — Server physics ticks at 60 Hz, network sync flushes every 2 ticks -- **Web Worker meshing** — Client offloads greedy meshing + AO to a dedicated Web Worker +- **Web Worker meshing** — Client offloads chunk meshing + AO to a dedicated Web Worker (per-face meshing with face culling; no greedy quad merging on `master`) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/FINDINGS.md b/ai-memory/docs/perf-external-notes-2026-03-05/FINDINGS.md new file mode 100644 index 00000000..ad698935 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/FINDINGS.md @@ -0,0 +1,80 @@ +# External Notes vs. HYTOPIA Source (Verification + PR Cross-Check) + +Base reference for verification in this branch: `origin/master` at `24a295d` (2026-03-05). + +## What Was Imported + +Unmodified external notes live in `ai-memory/docs/perf-external-notes-2026-03-05/raw/`. + +## Quick Take + +The external docs mix: + +- **Accurate observations about the current client** (notably: face culling exists; greedy meshing does not; geometry churn is high; packet decompression is synchronous). +- **Roadmap/architecture assumptions that do not match `master`** (procedural streaming, time-budgeted collider queues, LOD/occlusion/face-limit systems, several referenced constants/functions). + +So: use them as *idea input*, but treat many “current state” statements as unverified unless they point to code that exists on `master`. + +## Claim Verification (Against `master`) + +### Client meshing/rendering + +- ✅ **Face culling exists**: `client/src/workers/ChunkWorker.ts` culls faces when neighbor blocks are solid/opaque. +- ❌ **Greedy meshing is not implemented**: `client/src/workers/ChunkWorker.ts` emits per-face quads (4 vertices per visible face) with no quad merging pass. +- ❌ **Vertex pooling is not present**: `client/src/chunks/ChunkMeshManager.ts` recreates a new `BufferGeometry` for each batch update and disposes the old geometry. +- ❌ **LOD / cave occlusion / “face limit safety caps” described in notes are not found** via repo search on `client/src/` (`lod`, `occlusion`, face-count thresholds, BFS visibility, etc.). + +### Client networking + +- ✅ **Synchronous gzip decompression on the main thread**: `client/src/network/NetworkManager.ts` calls `gunzipSync` (fflate) before msgpack decode. + +### Server networking (entity/chunk sync) + +- ✅ **Entity pos/rot are a dominant sync path (and split to unreliable when pos/rot-only)**: `server/src/networking/NetworkSynchronizer.ts`. +- ❌ **No entity quantization/delta fields exist today**: `protocol/schemas/Entity.ts` has only `p` (Vector) and `r` (Quaternion). `server/src/networking/Serializer.ts` serializes full float arrays. +- ❌ **No chunk pacing/segmentation is implemented**: `server/src/networking/NetworkSynchronizer.ts` batches *all queued chunk syncs* into a single packet each sync. + +### Server colliders / chunk streaming + +Several external docs reference a *procedural streaming* pipeline (chunks-per-tick, queued collider chunk processing, async region I/O). Those specific codepaths/constants (e.g. `CHUNKS_PER_TICK`, `processPendingColliderChunks`, `COLLIDER_MAX_CHUNK_DISTANCE`, `server/src/worlds/maps/*`) are **not present on `master`**. + +## Notable Errors / Corrections in the Notes + +- **Quantized position range math is wrong as written**: + - If you encode `pq = round(x * 256)` into **int16**, the representable world range is about **±128 blocks**, not ±32768 blocks. + - To keep **1/256 block precision** over large worlds, you need larger integers (e.g. int32), smaller quantization, or chunk-relative encoding. + +## How This Relates to Your Performance PRs + +PRs authored by you that touch performance (as of 2026-03-05): + +- #2 (OPEN) `analysis/codebase-audit`: https://github.com/web3dev1337/hytopia-source/pull/2 +- #3 (OPEN) `docs/iphone-pro-performance-analysis`: https://github.com/web3dev1337/hytopia-source/pull/3 +- #4 (OPEN) `fix/fps-cap-medium-low`: https://github.com/web3dev1337/hytopia-source/pull/4 +- #5 (OPEN) `fix/cap-mobile-dpr`: https://github.com/web3dev1337/hytopia-source/pull/5 +- #6 (OPEN) `feature/map-compression`: https://github.com/web3dev1337/hytopia-source/pull/6 +- #7 (OPEN) `review/mirror-upstream-pr-9`: https://github.com/web3dev1337/hytopia-source/pull/7 +- #8 (OPEN) `review/mirror-upstream-pr-10` (stacked on #7): https://github.com/web3dev1337/hytopia-source/pull/8 +- #9 (OPEN) `review/mirror-upstream-pr-11`: https://github.com/web3dev1337/hytopia-source/pull/9 +- #10 (CLOSED) `fix/cap-mobile-devicepixelratio` (superseded): https://github.com/web3dev1337/hytopia-source/pull/10 + +Where they overlap with the external notes: + +- **High-DPI / mobile GPU load**: + - #4 adds a 60 FPS cap for MEDIUM/LOW (matches the “uncapped 120Hz” problem described in #3). + - #5 caps mobile pixel ratio (matches the “3x DPR” issue described in #3). + - #9 introduces a **pixel budget** based effective pixel ratio and reduces outline overhead (complementary to #3). +- **Outline pass overhead**: + - #9 removes per-mesh define mutation in `SelectiveOutlinePass` by prebuilding shader variants (reduces CPU/shader churn). It does **not** reduce the outline shader’s sampling cost. +- **View-distance mesh visibility**: + - `master` currently iterates all batch meshes each frame. #9 adds cached visibility sets and updates visibility only when the camera crosses a “cell” boundary or settings change. +- **Map size / load time**: + - #6 (compressed maps) addresses the external “JSON map size” concern; the external “binary streaming maps” discussion is broader than #6’s scope. + +## What’s Still Missing (Relative to the External Notes + Your PRs) + +- **Greedy meshing / quad merging** in `client/src/workers/ChunkWorker.ts`. +- **Entity sync quantization / deltas / distance-based rates** (protocol + serializer + client deserializer work). +- **Chunk packet pacing/segmentation** to avoid bursty chunk arrays at join / fast movement. +- **Off-main-thread decompression/decoding** for network payloads (or reduced use of sync `gunzipSync`). + diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/README.md b/ai-memory/docs/perf-external-notes-2026-03-05/README.md new file mode 100644 index 00000000..7be1a953 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/README.md @@ -0,0 +1,6 @@ +# External Performance Notes (Imported) + +These documents were copied from the Windows mount (`/mnt/c/Users/AB/Downloads`) on **2026-03-05** and treated as *unverified external notes*. + +- Canonical copies live in `raw/`. +- Some downloads existed as duplicate filenames with ` (1)` suffixes; those duplicates are preserved under `raw/duplicates/` for traceability. diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/COLLIDER_ARCHITECTURE_RESEARCH.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/COLLIDER_ARCHITECTURE_RESEARCH.md new file mode 100644 index 00000000..3fe81f1a --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/COLLIDER_ARCHITECTURE_RESEARCH.md @@ -0,0 +1,159 @@ +# Collider Architecture Research + +**Purpose:** Guide the refactor of Hytopia’s block collider system from O(world) to O(nearby chunks). +**Audience:** Engineers implementing Phase 1 (Collider Locality) and Phase 2 (Incremental Voxel Updates). + +--- + +## 1. Current Architecture + +### 1.1 Block Type → Collider + +- One collider per **block type** (dirt, stone, etc.), not per block. +- Voxel collider: Rapier voxel grid; each cell = block present/absent. +- Trimesh collider: Used for non-cube blocks; rebuilt when any block of that type changes. + +### 1.2 Critical Path + +``` +setBlock / addChunkBlocks + → _addBlockTypePlacement + → _getBlockTypePlacements() // iterates ALL chunks of this block type + → _combineVoxelStates(collider) // merges placements into voxel grid + → collider.addToSimulation / setVoxel +``` + +**Problem:** `_getBlockTypePlacements` and `_combineVoxelStates` touch every chunk that contains the block type. As world size grows, this becomes O(world). + +--- + +## 2. Target Architecture: Spatial Locality + +### 2.1 Principle + +- Colliders should only include blocks from chunks **within N chunks of any player** (e.g. N=4). +- When a chunk unloads (player moves away), remove its blocks from colliders. +- When a chunk loads, add its blocks to colliders only if it’s within the active radius. + +### 2.2 Data Structure Change + +**Current:** `_blockTypePlacements` is global (or implicitly spans all chunks). + +**Target:** Maintain a **spatial index**: + +```ts +// Chunk key (bigint) → for each block type in that chunk: Set of global coordinates +private _chunkBlockPlacements: Map>> = new Map(); + +// Active chunk keys: chunks within COLLIDER_RADIUS of any player +private _activeColliderChunkKeys: Set = new Set(); +``` + +- On chunk load: add chunk key to index; add block placements. +- On chunk unload: remove chunk key; remove blocks from colliders. +- `_getBlockTypePlacements` for collider: only return placements from `_activeColliderChunkKeys`. +- `_combineVoxelStates`: only iterate over placements from active chunks. + +### 2.3 Update Flow + +``` +Player moves + → Update _activeColliderChunkKeys (chunks within radius) + → For chunks that left radius: remove from colliders + → For chunks that entered radius: add to colliders + → _combineVoxelStates only over active placements +``` + +--- + +## 3. Incremental Voxel Updates + +### 3.1 Current + +- Adding a chunk: all 4096 blocks added at once to the voxel collider. +- Heavy: `setVoxel` 4096 times + propagation. + +### 3.2 Target + +- Add blocks in **batches** (e.g. 256–512 per tick). +- Time-budget: stop when budget exceeded; resume next tick. +- Rapier voxel API: check if it supports incremental `setVoxel` without full rebuild. + +### 3.3 Implementation Sketch + +```ts +private _pendingVoxelAdds: Array<{ chunk: Chunk; blockTypeId: number; nextIndex: number }> = []; + +function processPendingVoxelAdds(timeBudgetMs: number) { + const start = performance.now(); + while (this._pendingVoxelAdds.length > 0 && (performance.now() - start) < timeBudgetMs) { + const next = this._pendingVoxelAdds[0]; + const chunk = next.chunk; + const count = Math.min(256, chunk.blockCountForType(next.blockTypeId) - next.nextIndex); + for (let i = 0; i < count; i++) { + const idx = next.nextIndex + i; + const globalCoord = chunk.getGlobalCoordinateFromIndex(idx); + collider.setVoxel(globalCoord, true); + } + next.nextIndex += count; + if (next.nextIndex >= chunk.blockCountForType(next.blockTypeId)) { + this._pendingVoxelAdds.shift(); + } + } +} +``` + +--- + +## 4. Trimesh Optimization + +### 4.1 Current + +- Trimesh collider rebuilt whenever any block of that type is added/removed. +- Rebuild = collect all placements, generate mesh, replace collider. + +### 4.2 Options + +1. **Spatial locality:** Only include trimesh blocks from active chunks. Reduces vertex count for large worlds. +2. **Deferred rebuild:** Queue rebuild; execute in next tick within time budget. +3. **Per-chunk trimesh:** If block type is sparse, consider per-chunk trimesh instances instead of one giant trimesh. (Larger change.) + +**Recommendation:** Start with (1) and (2). (3) is Phase 6. + +--- + +## 5. Collider Unload + +When a chunk unloads: + +1. Remove its block placements from the spatial index. +2. For each block type in that chunk: + - Voxel: `setVoxel(coord, false)` for each placement. + - Trimesh: trigger rebuild (only over active chunks). +3. Remove chunk from `_activeColliderChunkKeys`. + +--- + +## 6. Rapier Voxel API Notes + +- Check `rapier3d` docs for `ColliderDesc.heightfield` vs `ColliderDesc.voxel`. +- Voxel colliders: typically a 3D grid; `setVoxel` may or may not support incremental updates. +- If full rebuild required per update: minimize rebuild frequency (batch changes) and scope (active chunks only). + +--- + +## 7. Success Criteria + +| Metric | Before | After | +|--------|--------|-------| +| Chunks scanned per collider update | O(world) | O(active) ~100–300 | +| Time per `_combineVoxelStates` | 5–50 ms | <2 ms | +| Collider add spikes | Full chunk at once | Batched, time-budgeted | + +--- + +## References + +- `ChunkLattice.ts` – `_addChunkBlocksToColliders`, `_combineVoxelStates`, `_getBlockTypePlacements` +- Rapier3D voxel API +- Minecraft: per-section collision, spatial culling diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md new file mode 100644 index 00000000..c689a4de --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md @@ -0,0 +1,226 @@ +# Entity Sync: Delta / Compression Design + +**Goal:** Reduce entity position/rotation packet size and bandwidth (currently ~90% of all packets) by replacing full pos/rot with delta or compressed formats. + +--- + +## 1. Current State + +### Flow +- **Server:** Every tick, `entityManager.checkAndEmitUpdates()` runs; each entity calls `checkAndEmitUpdates()`. +- **Entity:** Emits `UPDATE_POSITION` or `UPDATE_ROTATION` when change exceeds threshold: + - **Position:** `ENTITY_POSITION_UPDATE_THRESHOLD_SQ = 0.04²` (0.04 block) + - **Rotation:** `ENTITY_ROTATION_UPDATE_THRESHOLD = cos(3°/2)` (~3°) + - **Player:** Looser position threshold `0.1²` blocks +- **NetworkSynchronizer:** Queues `{ i: id, p: [x,y,z] }` and/or `{ i: id, r: [x,y,z,w] }`. +- **Every 2 ticks (30 Hz):** Splits into reliable vs unreliable; pos/rot-only goes to **unreliable** channel. +- **Serializer:** `serializeVector` → `[x, y, z]`, `serializeQuaternion` → `[x, y, z, w]` (full floats). +- **Transport:** msgpackr with `useFloat32: FLOAT32_OPTIONS.ALWAYS` → 4 bytes per float. + +### Per-Entity Packet Size (approx) +| Format | Bytes (msgpack) | +|--------|-----------------| +| `{ i, p }` pos-only | ~25–35 | +| `{ i, r }` rot-only | ~30–40 | +| `{ i, p, r }` both | ~50–65 | +| 10 entities, pos+rot | ~500–650 | + +With 20 entities at 30 Hz: **~15–20 KB/s** for entity sync alone. + +--- + +## 2. Options for Delta / Compression + +### Option A: Quantized Position (Fixed-Point) + +**Idea:** Encode position as integers. 1 unit = 1/256 block → 0.004 block precision. + +- Range ±32768 blocks → 16-bit signed per axis. +- 3 × 2 bytes = **6 bytes** vs 3 × 4 = 12 bytes (float32). +- **~50% smaller** for position. + +**Implementation:** +```ts +// Server +const QUANT = 256; +p: [Math.round(x * QUANT), Math.round(y * QUANT), Math.round(z * QUANT)] + +// Client +position.x = p[0] / QUANT; // etc. +``` + +**Trade-off:** Precision ~0.004 block. For player/NPC movement this is fine. For very small objects, may need higher quant (e.g. 1024). + +--- + +### Option B: Quantized Quaternion (Smallest-Three) + +**Idea:** Unit quaternion has `q.x² + q.y² + q.z² + q.w² = 1`. Store the 3 components with largest magnitude; reconstruct 4th. + +- 3 × 2 bytes (quantized) = **6 bytes** vs 4 × 4 = 16 bytes. +- **~62% smaller** for rotation. + +**Implementation:** Standard "smallest three" quaternion compression (e.g. [RigidBodyDynamics](https://github.com/gameworks-builder/rigid-body-dynamics) style). Needs protocol change to support packed format. + +--- + +### Option C: Yaw-Only for Player Rotation + +**Idea:** Many entities (players, NPCs) only rotate around Y. Send 1 float (yaw) instead of 4. + +- **4 bytes** vs 16 bytes. +- **75% smaller** for rotation when applicable. + +**Caveat:** Doesn't work for entities with pitch/roll (e.g. flying, vehicles). Use as opt-in per entity type. + +--- + +### Option D: Delta Encoding (Δ from Last Sent) + +**Idea:** Send `Δp = p - p_last` instead of absolute `p`. Small movements → small deltas → msgpack encodes as smaller integers. + +- No schema change; still `[dx, dy, dz]` but values typically small. +- msgpack variable-length integers: small values use 1 byte. +- **Benefit:** 20–50% smaller when movement is small. No extra state on client if server tracks last-sent. + +**Implementation:** Server stores `_lastSentPosition` per entity per player (or broadcast). Send delta; client adds to last known position. Requires client to track "last applied" position. + +--- + +### Option E: Bulk / AoS Format + +**Idea:** Instead of `[{i:1,p:[x,y,z]},{i:2,p:[x,y,z]},...]` use structure of arrays: + +```ts +{ ids: [1,2,3], p: [[x,y,z],[x,y,z],[x,y,z]] } +``` + +- Avoids repeating keys `i`, `p` for every entity (msgpack dedup helps but structure still has overhead). +- **Benefit:** ~15–25% smaller from less map/array framing. + +**Caveat:** Requires new packet schema and client deserializer changes. All-or-nothing; can't mix with current EntitySchema in same packet. + +--- + +### Option F: Distance-Based Sync Rate + +**Idea:** Sync nearby entities at 30 Hz, distant at 10 Hz or 5 Hz. + +- **Benefit:** Fewer packets for far entities; natural LOD. +- **Implementation:** In `checkAndEmitUpdates` or NetworkSynchronizer, track distance from each player; only queue updates for entity if `tick % rateDivisor === 0` based on distance band. + +--- + +## 3. Recommended Approach + +### Phase 1: Low-Risk Wins (1–2 days each) + +| # | Change | Impact | Effort | +|---|--------|--------|--------| +| 1 | **Quantized position** (1/256 block) | ~50% smaller pos | 1 day | +| 2 | **Distance-based sync rate** (30/15/5 Hz bands) | Fewer far-entity updates | 1 day | +| 3 | **Yaw-only rotation** for player entities | ~75% smaller rot for players | 0.5 day | + +### Phase 2: Schema Changes (3–5 days) + +| # | Change | Impact | Effort | +|---|--------|--------|--------| +| 4 | **Quantized quaternion** (smallest-three) | ~62% smaller rot | 2–3 days | +| 5 | **Bulk entity update packet** | ~15–25% smaller framing | 2 days | + +### Phase 3: Advanced (Optional) + +| # | Change | Impact | Effort | +|---|--------|--------|--------| +| 6 | **Delta encoding** | Additional 20–50% when movement small | 2–3 days | +| 7 | **Client-side prediction** | Reduce perceived latency, fewer corrections | 1+ week | + +--- + +## 4. Protocol Changes Required + +### Option 1: Extend EntitySchema (Backwards Compatible) + +Add optional compressed fields; client detects and uses when present: + +```ts +// New optional fields +EntitySchema = { + i: number; + p?: VectorSchema; // existing: [x,y,z] float + r?: QuaternionSchema; // existing: [x,y,z,w] float + pq?: [number,number,number]; // quantized position (1/256 block) + rq?: [number,number,number]; // quantized quaternion (smallest-three) + ry?: number; // yaw only (radians) + // ... +} +``` + +- Server sends `pq` instead of `p` when quantized format enabled. +- Client checks `pq` first, falls back to `p`. +- Old clients ignore `pq`; new clients prefer `pq` when present. + +### Option 2: New Packet Type + +Add `EntityPosRotBulkPacket`: + +```ts +{ + ids: number[], + positions?: Int16Array | number[][], // quantized + rotations?: number[][] | Int16Array[] // quantized or yaw-only +} +``` + +- Used only for unreliable pos/rot updates. +- Existing `EntitiesPacket` still used for spawn/reliable updates. + +--- + +## 5. Key Files + +| Component | Path | +|-----------|------| +| Entity update emission | `server/src/worlds/entities/Entity.ts` (checkAndEmitUpdates) | +| Player threshold | `server/src/worlds/entities/PlayerEntity.ts` | +| Network sync queue | `server/src/networking/NetworkSynchronizer.ts` | +| Serializer | `server/src/networking/Serializer.ts` | +| Protocol schema | `protocol/schemas/Entity.ts` | +| Client deserializer | `client/src/network/Deserializer.ts` | +| Client entity update | `client/src/entities/EntityManager.ts` (_updateEntity) | +| Transport | `server/src/networking/Connection.ts`, `client/.../NetworkManager.ts` | + +--- + +## 6. Quantization Constants (Suggested) + +```ts +// Position: 1/256 block = 0.0039 block precision +const POSITION_QUANT = 256; + +// Position range: ±32768 blocks (16-bit signed) +// Covers ~1km in each direction +const POSITION_MAX = 32767; +const POSITION_MIN = -32768; + +// Quaternion: 16-bit per component, range [-1, 1] → 1/32767 precision +const QUATERNION_QUANT = 32767; +``` + +--- + +## 7. Success Metrics + +| Metric | Current | Target (Phase 1) | Target (Phase 2) | +|--------|---------|------------------|------------------| +| Entity bytes/update (10 entities) | ~500–650 | ~300–400 | ~200–280 | +| Entity sync % of total packets | ~90% | ~70% | ~50% | +| Bandwidth (20 entities, 30 Hz) | ~15–20 KB/s | ~8–12 KB/s | ~5–8 KB/s | + +--- + +## 8. References + +- [Quaternion Compression (smallest three)](http://gafferongames.com/networked-physics/snapshot-compression/) +- [Minecraft entity sync (delta/quantization)](https://wiki.vg/Protocol#Entity_Metadata) +- Current codebase: `Entity.ts` (checkAndEmitUpdates), `NetworkSynchronizer.ts` (entity sync split), `Serializer.ts` (serializeVector/Quaternion) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md new file mode 100644 index 00000000..66642683 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md @@ -0,0 +1,182 @@ +# Greedy Meshing Implementation Guide + +**Purpose:** Step-by-step guide for implementing greedy quad merging (cubic/canonical meshing) in Hytopia’s ChunkWorker. +**Audience:** Engineers implementing Phase 4 (Greedy Meshing). +**Prerequisites:** Read [0fps Part 1](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) and [Part 2](https://0fps.net/2012/07/07/meshing-minecraft-part-2/). + +--- + +## 1. Algorithm Overview + +### 1.1 Input and Output + +- **Input:** Chunk of 16³ blocks. Each block has type ID, optional rotation. +- **Output:** Merged quads (position, size, normal, block type, AO, light). + +### 1.2 High-Level Steps + +1. **Group by (block type, normal, material flags).** Faces with same texture and normal are mergeable. +2. **For each direction** (±X, ±Y, ±Z): + - Build a 2D slice of visible faces (e.g. for +Y, iterate Y layers; for each layer, collect top faces). + - Run 2D greedy merge: combine adjacent same-type faces into rectangles. +3. **Emit merged quads** with correct UVs, AO, and lighting. + +--- + +## 2. Detailed Algorithm (0fps Style) + +### 2.1 Slice Extraction + +For direction `+Y` (top faces): + +- For each Y level `y = 0..15`: + - For each (x, z) in 16×16: + - If block at (x, y, z) is solid and block at (x, y+1, z) is air/transparent: + - Add face with normal (0, 1, 0), block type = block at (x, y, z). + - This gives a 16×16 grid of “face presence” per block type. + - Run 2D greedy merge on this grid. + +Repeat for −Y, ±X, ±Z. + +### 2.2 2D Greedy Merge (Per Slice, Per Block Type) + +``` +for each row j in slice: + for each column i in slice: + if visited[i,j]: continue + if no face at (i,j): continue + blockType = face at (i,j) + width = 1 + while i+width < 16 and same block at (i+width, j) and same AO/light: + width++ + height = 1 + while j+height < 16: + row OK = true + for k = 0 to width-1: + if different block or visited[i+k, j+height]: row OK = false; break + if !row OK: break + height++ + mark (i,j)..(i+width-1, j+height-1) as visited + emit quad: origin (i,j), size (width, height), blockType +``` + +### 2.3 Lexicographic Order (0fps) + +To get deterministic, visually stable meshes, merge in a fixed order (e.g. top-to-bottom, left-to-right) and prefer the lexicographically smallest representation when multiple merges are possible. + +--- + +## 3. Integration with ChunkWorker + +### 3.1 Current Flow (Simplified) + +``` +for each block in chunk: + for each face (6 directions): + if face visible (neighbor empty/transparent): + emit quad +``` + +### 3.2 New Flow + +``` +// Group 1: Opaque solid blocks (greedy) +for dir in [+X,-X,+Y,-Y,+Z,-Z]: + slice = extractVisibleFaces(chunk, dir) + for blockType in unique block types in slice: + subslice = slice filtered by blockType + quads = greedyMerge2D(subslice, dir) + emit quads with AO, light + +// Group 2: Transparent / special (per-face, existing logic) +for each block in chunk: + if block is transparent or special: + for each face: + if visible: emit quad +``` + +### 3.3 AO and Lighting + +- Ambient occlusion: compute per-vertex AO from neighbor blocks (as today). +- Light: sample from light volume (as today). +- For merged quads: corners may have different AO/light. Options: + - **Option A:** Use min AO/light of the merged region (slightly darker; simpler). + - **Option B:** Subdivide quad where AO/light changes (more quads, better quality). + - **Recommendation:** Start with Option A; optimize later. + +--- + +## 4. Data Structures + +### 4.1 Slice Representation + +```ts +// 16x16 grid, value = block type ID (0 = no face) +type Slice = Uint8Array; // 256 elements + +// Or: (blockTypeId, ao, light) per cell if we merge only when all match +interface SliceCell { + blockTypeId: number; + ao: number; + light: number; +} +``` + +### 4.2 Visited Mask + +```ts +// 16x16 boolean +const visited = new Uint8Array(256); // 1 bit per cell, or just 256 bytes +``` + +### 4.3 Merged Quad Output + +```ts +interface MergedQuad { + x: number; // local origin + y: number; + z: number; + width: number; // in blocks, along one horizontal axis + height: number; // in blocks, along other axis + normal: [number, number, number]; + blockTypeId: number; + ao: number; // or per-corner if subdividing + light: number; +} +``` + +--- + +## 5. Implementation Order + +| Step | Task | Est. Time | +|------|------|-----------| +| 1 | Slice extraction for +Y (top faces) | 1 day | +| 2 | 2D greedy merge for +Y slice | 1 day | +| 3 | Apply to all 6 directions | 0.5 day | +| 4 | AO/light handling for merged quads | 1 day | +| 5 | Integration: replace per-face loop for opaque solids | 1 day | +| 6 | Benchmark: vertex count and build time | 0.5 day | +| 7 | Edge cases: chunk boundaries, multi-type batches | 1 day | + +--- + +## 6. Expected Results + +| Terrain Type | Before (vertices) | After (est.) | Reduction | +|--------------|-------------------|--------------|-----------| +| Flat 16×16 | ~6000 | ~200 | ~30× | +| Hilly | ~8000 | ~800 | ~10× | +| Caves | ~4000 | ~600 | ~7× | +| Mixed | ~6000 | ~500 | ~12× | + +Build time may increase by 10–30% due to extra passes; vertex reduction should yield net FPS gain. + +--- + +## 7. References + +- [0fps Part 1 – Meshing in a Minecraft Game](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) +- [0fps Part 2 – Multiple block types](https://0fps.net/2012/07/07/meshing-minecraft-part-2/) +- [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher) (JavaScript reference) +- [Vercidium greedy voxel meshing gist](https://gist.github.com/Vercidium/a3002bd083cce2bc854c9ff8f0118d33) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/MAP_ENGINE_ARCHITECTURE.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/MAP_ENGINE_ARCHITECTURE.md new file mode 100644 index 00000000..f58f543a --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/MAP_ENGINE_ARCHITECTURE.md @@ -0,0 +1,272 @@ +# Hytopia Map Engine Architecture + +This document describes how the Hytopia map engine is set up, its data flow, and a roadmap for adapting it to support **binary maps** for extremely large worlds (e.g., 100k×100k×64 blocks). + +--- + +## 1. Architecture Overview + +The map engine spans **server** (authoritative block state), **client** (rendering, meshing), and **protocol** (network serialization). Maps are loaded once at world initialization and populate a chunk-based block lattice. + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ MAP LOAD PIPELINE │ +├─────────────────────────────────────────────────────────────────────────┤ +│ │ +│ JSON Map File World.loadMap() ChunkLattice │ +│ (blockTypes, blocks, ───────────────► initializeBlockEntries() │ +│ entities) │ │ │ +│ │ │ ▼ │ +│ │ │ ChunkLattice clears, │ +│ │ │ creates Chunks, │ +│ │ │ builds colliders │ +│ │ │ │ │ +│ │ ▼ ▼ │ +│ │ BlockTypeRegistry Map │ +│ │ (block types) (sparse chunks) │ +│ │ │ │ +│ │ ▼ │ +│ │ NetworkSynchronizer │ +│ │ (chunk sync to │ +│ │ clients) │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 2. WorldMap Interface (JSON Format) + +Maps conform to the `WorldMap` interface used by `World.loadMap()`: + +| Section | Purpose | Location | +|---------------|--------------------------------------------------------|-------------------------------| +| `blockTypes` | Block type definitions (id, name, textureUri, etc.) | `server/src/worlds/World.ts` | +| `blocks` | Block placements keyed by `"x,y,z"` string | `WorldMap.blocks` | +| `entities` | Entity spawns keyed by `"x,y,z"` position | `WorldMap.entities` | + +### Block Format in JSON + +Each block entry is either: + +- **Short form:** `"x,y,z": ` (e.g. `"-25,0,-16": 7`) +- **Extended form:** `"x,y,z": { "i": , "r": }` + +Coordinates are **world block coordinates** (integers). Block type IDs are 0–255 (0 = air, 1–255 = registered block types). + +### Size Implications of JSON Maps + +| Factor | Impact | +|---------------------------|-----------------------------------------------------------------------| +| Sparse object keys | Each block = `"x,y,z"` string key (10–20+ chars) + JSON overhead | +| No chunk-level batching | All blocks listed individually; no spatial grouping | +| Parsing cost | Full JSON parse loads entire map into memory before processing | +| File size | `boilerplate-small.json` ≈ 4,600+ lines; `big-world` ≈ 309,000+ lines | + +For a **100k×100k×64** fully dense map: + +- Blocks: 640 billion +- JSON would be impractically huge (hundreds of GB+ as text) +- Even sparse terrain would produce multi-GB JSON for large worlds + +--- + +## 3. Chunk Model + +### Chunk Dimensions + +| Constant | Value | Location | +|----------------|-------|--------------------------------------| +| `CHUNK_SIZE` | 16 | `server/src/worlds/blocks/Chunk.ts` | +| `CHUNK_VOLUME` | 4096 | 16³ blocks per chunk | +| `MAX_BLOCK_TYPE_ID` | 255 | `Chunk.ts` | + +Chunk origins are multiples of 16 on each axis (e.g. `(0,0,0)`, `(16,0,0)`, `(0,16,0)`). + +### Chunk Storage + +- **`Chunk._blocks`:** `Uint8Array(4096)` – block type ID per voxel +- **`Chunk._blockRotations`:** `Map` – sparse map of block index → rotation +- **Block index:** `x + (y << 4) + (z << 8)` (local coords 0–15) + +Chunks are stored in `ChunkLattice._chunks` as `Map` keyed by packed chunk origin: + +```typescript +// ChunkLattice._packCoordinate() – 54 bits per axis +chunkKey = (x << 108) | (y << 54) | z +``` + +--- + +## 4. Load Flow: `World.loadMap()` + +```typescript +// server/src/worlds/World.ts +public loadMap(map: WorldMap) { + this.chunkLattice.clear(); + + // 1. Register block types + if (map.blockTypes) { + for (const blockTypeData of map.blockTypes) { + this.blockTypeRegistry.registerGenericBlockType({ ... }); + } + } + + // 2. Iterate blocks as generator, feed to ChunkLattice + if (map.blocks) { + const blockEntries = function* () { + for (const key in mapBlocks) { + const blockValue = mapBlocks[key]; + const blockTypeId = typeof blockValue === 'number' ? blockValue : blockValue.i; + const blockRotationIndex = typeof blockValue === 'number' ? undefined : blockValue.r; + const [x, y, z] = key.split(',').map(Number); + yield { globalCoordinate: { x, y, z }, blockTypeId, blockRotation }; + } + }; + this.chunkLattice.initializeBlockEntries(blockEntries()); + } + + // 3. Spawn entities + if (map.entities) { ... } +} +``` + +### `ChunkLattice.initializeBlockEntries()` + +- Clears the lattice +- For each block: resolves chunk, creates chunk if needed, calls `chunk.setBlock()` +- Tracks block placements per type for colliders +- After all blocks: builds one collider per block type (voxel or trimesh) + +--- + +## 5. Client-Server Chunk Sync + +Chunks are serialized and sent to clients via `NetworkSynchronizer`: + +| Protocol Field | Description | +|----------------|--------------------------------------| +| `c` | Chunk origin `[x, y, z]` | +| `b` | Block IDs `Uint8Array \| number[]` (4096) | +| `r` | Rotations: flat `[blockIndex, rotIndex, ...]` | +| `rm` | Chunk removed flag | + +- **Serializer:** `Serializer.serializeChunk()` → `protocol.ChunkSchema` +- **Client:** `Deserializer.deserializeChunk()` → `DeserializedChunk` +- **ChunkWorker:** Receives `chunk_update`, registers chunk, builds meshes + +The client does **not** load the JSON map. It receives chunks from the server over the network after a player joins a world. + +--- + +## 6. Key Files Reference + +| Component | Path | +|----------------------|--------------------------------------------------| +| WorldMap interface | `server/src/worlds/World.ts` | +| loadMap | `server/src/worlds/World.ts` | +| ChunkLattice | `server/src/worlds/blocks/ChunkLattice.ts` | +| Chunk | `server/src/worlds/blocks/Chunk.ts` | +| ChunkSchema (proto) | `protocol/schemas/Chunk.ts` | +| Serializer | `server/src/networking/Serializer.ts` | +| ChunkWorker (client) | `client/src/workers/ChunkWorker.ts` | +| Deserializer | `client/src/network/Deserializer.ts` | + +--- + +## 7. Binary Map Adaptation Roadmap for 100k×100k×64 + +To support huge maps efficiently, the engine should move from JSON to **binary map sources** with **chunk-level loading** and **streaming**. + +### 7.1 Binary Chunk Format (Proposed) + +Store one file or region per chunk (or region of chunks): + +``` +chunk.{cx}.{cy}.{cz}.bin OR region.{rx}.{ry}.{rz}.bin +``` + +**Suggested layout per chunk (raw):** + +| Offset | Size | Content | +|--------|--------|------------------------------------------| +| 0 | 12 | Origin (3× int32: x, y, z) | +| 12 | 4096 | Block IDs (Uint8Array) | +| 4108 | var | Sparse rotations: count + [idx, rot]... | + +Or use a compact format (e.g. run-length encoding for air, or palette indices) for sparse chunks. + +### 7.2 Streaming / Lazy Loading + +- **Do not** load the entire map into memory. +- Use a **chunk provider** that: + - Accepts `(chunkOriginX, chunkOriginY, chunkOriginZ)` and returns chunk data + - Reads from binary files, memory-mapped files, or a database +- Replace the current `loadMap()` bulk load with: + - Initial load of a small seed area (e.g. spawn region) + - On-demand loading when `ChunkLattice.getOrCreateChunk()` needs a chunk not yet in memory + +### 7.3 Implementation Strategy + +1. **`MapProvider` interface** + ```typescript + interface MapProvider { + getChunk(origin: Vector3Like): ChunkData | null | Promise; + getBlockTypes(): BlockTypeOptions[]; + } + ``` + +2. **`BinaryMapProvider`** + - Reads `.bin` chunk files from disk or object storage + - Maps chunk origin → file path or byte range + - Returns `{ blocks: Uint8Array, rotations: Map }` + +3. **ChunkLattice changes** + - Replace `initializeBlockEntries()` full load with lazy `getOrCreateChunk()` that: + - Checks `_chunks` cache + - If miss: calls `MapProvider.getChunk()`, creates `Chunk`, inserts into `_chunks` + - Optionally preload chunks in a radius around player(s) + +4. **Block types** + - Keep block types in a small JSON or separate binary; they are tiny compared to block data. + - Load once at startup; no need to stream. + +### 7.4 Scale Estimates for 100k×100k×64 + +| Metric | Value | +|---------------------------|--------------------------| +| World dimensions | 100,000 × 100,000 × 64 | +| Chunks (16³) | 6,250 × 6,250 × 4 ≈ 156M chunks | +| Bytes per chunk (raw) | ~4.1 KB (blocks only) | +| Raw block data (if dense) | ~640 GB | +| Sparse (e.g. surface) | Much less; only store non-air chunks | + +Binary format advantages: + +- No JSON parsing; direct `Uint8Array` use +- Chunk-level I/O; load only what’s needed +- Possible memory-mapping for large files +- Optional compression (e.g. LZ4, Zstd) per chunk or region + +### 7.5 Migration Path + +1. **Phase 1:** Add `BinaryMapProvider` that reads chunk `.bin` files; `loadMap()` can accept `WorldMap | MapProvider`. +2. **Phase 2:** Make `ChunkLattice.getOrCreateChunk()` use the provider when a chunk is missing. +3. **Phase 3:** Add tooling to convert existing JSON maps → binary chunk files. +4. **Phase 4:** Optional region/compression format for production. + +--- + +## 8. Summary + +| Current (JSON) | Target (Binary + Streaming) | +|----------------------------|----------------------------------| +| Full map in memory | Chunk-level loading | +| Single large JSON parse | Small reads per chunk | +| Sparse object keys | Dense `Uint8Array` per chunk | +| Not viable for 100k³ scale | Designed for huge worlds | + +The existing `Chunk` and `ChunkLattice` design already matches a chunk-oriented model. The main changes are: + +1. Replace JSON as the map source with a binary chunk provider. +2. Add lazy loading so chunks are fetched on demand. +3. Provide conversion tools and a clear binary chunk layout. diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/MINECRAFT_ARCHITECTURE_RESEARCH.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/MINECRAFT_ARCHITECTURE_RESEARCH.md new file mode 100644 index 00000000..c7280c72 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/MINECRAFT_ARCHITECTURE_RESEARCH.md @@ -0,0 +1,161 @@ +# Minecraft Architecture Research + +**Purpose:** Inform Hytopia’s voxel engine design with lessons from Minecraft Java and Bedrock. +**Audience:** Engineers implementing chunk loading, colliders, and meshing. +**Sources:** Technical wikis, decompilations, community analysis, engine talks. + +--- + +## 1. Chunk System Overview + +### 1.1 Chunk Structure + +| Version | Chunk Size | Subchunk | Notes | +|---------|------------|----------|-------| +| Java | 16×256×16 (XZ columns) | 16×16×16 sections | Vertical column; sections loaded independently | +| Bedrock | 16×256×16 | 16×16×16 | Similar; different storage layout | + +**Hytopia:** 16×16×16 chunks, 2×2×2 batches (32³). Aligns with common practice. + +### 1.2 Loading States (Java 1.14+) + +Minecraft separates chunk lifecycle into distinct states: + +| State | Purpose | +|-------|---------| +| **Empty** | Not loaded | +| **Structure** | Structures placed | +| **Noise** | Terrain generated | +| **Surface** | Surface blocks, biomes | +| **Carvers** | Caves, ravines | +| **Features** | Trees, ores, etc. | +| **Entity ticking** | Physics, entities, block updates | + +**Key insight:** Entity ticking requires a 5×5 grid of loaded chunks around the center chunk. Border chunks can be “lazy” (block updates only, no entities). This **spatial locality** keeps entity/physics work bounded. + +**Hytopia takeaway:** Only tick entities and step physics for chunks near players. Don’t pay for distant chunks. + +### 1.3 Spawn Chunks + +- 19×19 chunks (Java) or 23×23 (Bedrock) always loaded around spawn. +- Only center ~12×12 process entities. +- Reduces load/unload churn at spawn. + +**Hytopia:** Preload radius already exists; consider an “always loaded” spawn core for hubs. + +--- + +## 2. File I/O and Region Format + +### 2.1 Region Files + +- One file per 32×32 chunk region (XZ). +- Anvil format: 4 KB header (1024 entries × 4 bytes) + chunk payloads. +- Chunks stored with length prefix + compression (typically zlib; Bedrock uses different schemes). +- **Async I/O:** Modern implementations use background threads; main thread never blocks on disk. + +### 2.2 Chunk Serialization + +- Block IDs, block states, light, heightmap, biomes stored per chunk. +- Compression reduces size by ~90% for typical terrain. + +**Hytopia:** Region format exists; `readChunkAsync` and `writeChunk` (sync) are in place. Priority: make persist async. + +--- + +## 3. Terrain Generation + +### 3.1 Worker Pool + +- Terrain generation runs in worker threads. +- Main thread requests chunk; worker generates; result returned asynchronously. +- Multiple workers allow parallelism. + +### 3.2 Generation Stages + +- Noise → carvers → features (trees, ores). +- Each stage can be parallelized or deferred. + +**Hytopia:** `TerrainWorkerPool` + `generateChunkAsync` exist. Ensure `requestChunk` uses this path and doesn’t fall back to sync. + +--- + +## 4. Physics and Collision + +### 4.1 Chunk-Section Colliders + +- Collision is built per 16×16×16 section. +- Sections far from players may not have colliders at all, or use simplified shapes. +- Colliders are created/updated in batches, not all at once. + +### 4.2 Spatial Partitioning + +- Physics world uses spatial partitioning (e.g. broadphase). +- Entity vs. block collision: only check nearby chunks. +- No global scan over entire world. + +**Hytopia gap:** `_combineVoxelStates` iterates all chunks of a block type. Must restrict to nearby chunks. + +--- + +## 5. Meshing and Rendering + +### 5.1 Greedy Meshing (Ambient Occlusion) + +- Minecraft uses an approximation of greedy meshing (block model merging). +- Adjacent faces of same block type are merged into larger quads where possible. +- Results in 2–64× fewer quads than per-face rendering. + +### 5.2 Occlusion Culling + +- Section-level visibility: if a section is fully behind solid terrain, skip rendering. +- BFS from camera through air/transparent blocks; mark visible sections. +- ~10–15% frame time savings in cave-heavy areas. + +### 5.3 LOD + +- Distant chunks use lower-detail meshes or impostors. +- Reduces overdraw and vertex count. + +**Hytopia:** Face culling ✅; greedy meshing ❌; occlusion partial; LOD step 2/4. Biggest win: greedy meshing. + +--- + +## 6. Network + +### 6.1 Chunk Packets + +- Chunks sent incrementally; rate-limited to avoid client flood. +- Delta updates for modified chunks (block changes) vs. full chunk for new loads. + +### 6.2 Entity Sync + +- Position/rotation use compact encodings (fixed-point or quantized). +- Entities use delta or relative positioning where possible. +- Distant entities may sync at lower rate. + +**Source:** [Minecraft Protocol (wiki.vg)](https://wiki.vg/Protocol#Entity_Metadata) + +--- + +## 7. Lessons for Hytopia + +| Minecraft Pattern | Hytopia Status | Action | +|-------------------|----------------|--------| +| Async chunk load | ✅ `requestChunk` + `getChunkAsync` | Verify usage | +| Async I/O | ✅ `readChunkAsync` | Make persist async | +| Worker terrain gen | ✅ TerrainWorkerPool | Verify | +| Collider locality | ❌ O(world) scans | Phase 1: spatial index, scoped merge | +| Greedy meshing | ❌ | Phase 4 | +| Occlusion | ⚠️ Partial | Phase 5 | +| Entity quantization | ❌ | Phase 3 | +| Distance-based sync | ❌ | Phase 3 | + +--- + +## References + +- [Chunk Loading – Technical Minecraft Wiki](https://techmcdocs.github.io/pages/GameMechanics/ChunkLoading/) +- [Minecraft Protocol – wiki.vg](https://wiki.vg/Protocol) +- [0fps Meshing in a Minecraft Game](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) +- [Fabric Modding Documentation (chunk loading states)](https://fabricmc.net/wiki/) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/NETWORK_PROTOCOL_2026_RESEARCH.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/NETWORK_PROTOCOL_2026_RESEARCH.md new file mode 100644 index 00000000..722d42a8 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/NETWORK_PROTOCOL_2026_RESEARCH.md @@ -0,0 +1,130 @@ +# Network Protocol 2026 Research + +**Purpose:** Modern entity sync and chunk sync patterns for low-bandwidth, low-latency voxel multiplayer. +**Audience:** Engineers implementing Phase 3 (Entity Sync Compression). + +--- + +## 1. Entity Sync: Industry Patterns + +### 1.1 Minecraft (Java) + +- Entity position/rotation sent as fixed-point or scaled integers. +- Metadata uses compact type tags. +- Delta updates for moving entities; full state on spawn or major change. + +### 1.2 Source Engine / Garry’s Mod + +- **Delta compression:** Send only changed fields; baseline is last full update. +- **Quantization:** Position in 1/16 or 1/32 unit; angles in 16-bit. + +### 1.3 Overwatch / Modern FPS + +- Client-side prediction + server reconciliation. +- Entity updates at 20–60 Hz for nearby; lower for distant. +- Snapshot compression: delta from previous snapshot. + +### 1.4 Gaffer On Games (Networked Physics) + +- [Snapshot Compression](http://gafferongames.com/networked-physics/snapshot-compression/) +- Quaternion: store 3 largest components (smallest-three); 4th derived. +- Position: fixed-point or quantized. +- Delta encoding: send difference from last acked state. + +--- + +## 2. Quantization Formulas + +### 2.1 Position (Fixed-Point) + +```ts +const QUANT = 256; // 1/256 block = 0.0039 block precision +const clamp = (v: number) => Math.max(-32768, Math.min(32767, Math.round(v * QUANT))); + +// Encode +pq: [clamp(x), clamp(y), clamp(z)] // Int16Array or [number, number, number] + +// Decode +x = pq[0] / QUANT; +``` + +**Range:** ±32768 blocks ≈ ±524 km. More than enough. + +### 2.2 Quaternion (Smallest-Three) + +- Unit quaternion: `q.x² + q.y² + q.z² + q.w² = 1`. +- One component can be derived from the other three. +- Store the 3 components with largest magnitude; 1 byte for index of omitted component. +- Quantize each stored component to 16-bit: `value * 32767` for range [-1, 1]. + +**Size:** 1 + 3×2 = 7 bytes vs 4×4 = 16 bytes (float32). ~56% smaller. + +**Reference:** [Gaffer On Games](http://gafferongames.com/networked-physics/snapshot-compression/) + +### 2.3 Yaw-Only (Euler) + +- For entities that only rotate around Y: send 1 float (radians) or 16-bit quantized. +- `yaw = 2*PI * (int16 / 65536)`. +- 2 bytes vs 16 bytes for full quaternion. + +--- + +## 3. Distance-Based Sync Rate + +| Distance Band | Sync Rate | Use Case | +|---------------|-----------|----------| +| 0–4 chunks | 30 Hz | Player, nearby NPCs | +| 4–8 chunks | 15 Hz | Mid-range entities | +| 8+ chunks | 5 Hz | Far entities, environmental | + +**Implementation:** In `checkAndEmitUpdates` or NetworkSynchronizer, compute distance from nearest player; only emit if `tick % rateDivisor === 0`. + +--- + +## 4. Bulk Format (Structure of Arrays) + +Instead of: + +```json +[ + { "i": 1, "p": [10.5, 20.1, 30.2] }, + { "i": 2, "p": [11.2, 20.0, 31.1] } +] +``` + +Use: + +```json +{ + "ids": [1, 2], + "p": [[2693, 5146, 7733], [2867, 5120, 7962]] +} +``` + +- Quantized positions in `p` (Int16). +- Avoids repeating keys; msgpack benefits from smaller maps. +- **Caveat:** New packet type; client must support. Can run parallel to existing EntitiesPacket during migration. + +--- + +## 5. Protocol Versioning + +- Add optional fields to EntitySchema: `pq`, `rq`, `ry`. +- Old clients ignore unknown fields; new clients prefer them. +- Server flag: `useQuantizedEntitySync=true` (default for new connections after version bump). + +--- + +## 6. Chunk Delta Updates (Phase 6) + +- When a single block changes, send delta: `{ chunkId, blockIndex, blockTypeId }` instead of full chunk. +- Client applies delta to local chunk; requests full chunk if out of sync. +- Reduces bandwidth for frequent block edits (mining, building). + +--- + +## 7. References + +- [Gaffer On Games – Snapshot Compression](http://gafferongames.com/networked-physics/snapshot-compression/) +- [Minecraft Protocol – wiki.vg](https://wiki.vg/Protocol) +- [ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md](../ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md) – Hytopia-specific design diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/SMOOTH_WORLD_STREAMING_REFACTOR_PLAN.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/SMOOTH_WORLD_STREAMING_REFACTOR_PLAN.md new file mode 100644 index 00000000..cead4812 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/SMOOTH_WORLD_STREAMING_REFACTOR_PLAN.md @@ -0,0 +1,180 @@ +# Smooth World Streaming Refactor Plan + +> **Canonical roadmap:** See [VOXEL_ENGINE_2026_MASTER_PLAN.md](./VOXEL_ENGINE_2026_MASTER_PLAN.md) for the full executive plan and phased roadmap. This document provides additional context and cross-references. + +**Goal:** Peak performance for the procedurally generated world—smooth streaming, no lag spikes, Minecraft/Hytale/bloxd-level polish. + +**Sources:** Codebase analysis, [VOXEL_PERFORMANCE_MASTER_PLAN.md](./VOXEL_PERFORMANCE_MASTER_PLAN.md), [CHUNK_LOADING_ARCHITECTURE.md](./CHUNK_LOADING_ARCHITECTURE.md), [VOXEL_RENDERING_RESEARCH.md](./VOXEL_RENDERING_RESEARCH.md), [PR #21](https://github.com/hytopiagg/hytopia-source/pull/21), and industry patterns from Minecraft, Hytale, and Bloxd. + +--- + +## 1. Competitive Analysis: Minecraft vs Hytale vs Bloxd vs Hytopia + +| Aspect | Minecraft | Hytale | Bloxd | Hytopia (Current) | +|--------|-----------|--------|-------|-------------------| +| **Chunk load** | Worker threads, async | Worker pool | JS async | ✅ `requestChunk` + `getChunkAsync` (TerrainWorkerPool) | +| **File I/O** | Async | Async | N/A (streaming) | ✅ `readChunkAsync` (PersistenceChunkProvider) | +| **Terrain gen** | Worker threads | Worker pool | — | ✅ `generateChunkAsync` (TerrainWorkerPool) | +| **Physics colliders** | Deferred, O(chunk) | Batched, spatial | Custom voxel | ❌ Sync, O(world) via `_combineVoxelStates` | +| **Collider locality** | Per-chunk, near player | Spatial culling | — | ⚠️ Partial (COLLIDER_MAX_CHUNK_DISTANCE=3) | +| **Greedy meshing** | ✅ | ✅ (mesh culling) | ✅ | ❌ 1 quad/face, ~64× extra geometry | +| **Chunk send rate** | Incremental, rate-limited | Batched | Streaming | ⚠️ MAX_CHUNKS_PER_SYNC=8, can burst | +| **Entity sync** | Delta / compressed | — | — | Full pos/rot 30 Hz, 90%+ of packets | +| **LOD** | ✅ | Variable chunk sizes | — | ✅ (step 2/4) | +| **Occlusion** | Cave culling | Partial | — | ⚠️ Only when over face limit | +| **Vertex pooling** | — | — | ✅ | ⚠️ Partial (size-match reuse) | +| **Map compression** | Region format | — | — | ❌ JSON maps large; PR #21 adds compression | + +**Gap summary:** Hytopia’s biggest gaps are (1) collider work O(world) and sync, (2) no greedy meshing, (3) entity sync volume, (4) JSON map size for non-procedural games. Procedural world already uses async load + worker terrain gen; collider and client-side mesh work are the main bottlenecks. + +--- + +## 2. PR #21 Relevance to Procedural World + +[PR #21: Compressed world maps](https://github.com/hytopiagg/hytopia-source/pull/21) targets **JSON maps** (`loadMap(map.json)`), not procedural/region worlds. It adds: + +| Feature | Applies to Procedural? | Notes | +|---------|------------------------|-------| +| `map.compressed.json` | ❌ | JSON map format only | +| `map.chunks.bin` (chunk cache) | ❌ | Prebaked JSON map chunks | +| Chunk cache collider build | ⚠️ Partially | “perf: speed up chunk cache collider build” can inform collider design | +| Brotli compression | ❌ | For map JSON, not region .bin | +| Auto-detect / `hytopia map-compress` | ❌ | JSON map workflow | + +**Recommendation:** Merge PR #21 for JSON-map games (huntcraft, boilerplate, etc.). For procedural world, reuse the collider build approach where relevant. Procedural persistence uses region `.bin`; consider Brotli for region payloads later. + +--- + +## 3. Root Cause Summary + +When a player joins and blocks have physics: + +1. **Physics step (60 Hz):** Rapier steps the entire world, including all block colliders + player rigid body. +2. **Collider creation:** `_addChunkBlocksToColliders` → `_combineVoxelStates` scans all chunks of each block type (O(world)). +3. **Entity sync (30 Hz):** Full position/rotation for entities/players every 2 ticks; dominates packet volume. +4. **Chunk sync:** Up to 8 chunks per sync; client mesh build can spike main thread. +5. **Client mesh:** No greedy meshing → 2–64× more vertices than needed. +6. **ADD_CHUNK events:** Environmental entity spawn per chunk runs synchronously. + +--- + +## 4. Refactoring Plan (Prioritized) + +### Phase 1: Stop the Bleeding (1–2 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 1.1 | **Collider locality – spatial index** | High | 3–5 days | `ChunkLattice.ts` | +| 1.2 | **Scoped `_combineVoxelStates`** | High | 2–3 days | `ChunkLattice.ts` | +| 1.3 | **Time-budget collider processing** | Medium | ✅ Done | `playground.ts` | +| 1.4 | **CHUNKS_PER_TICK = 3** | ✅ Done | — | `playground.ts` | +| 1.5 | **Defer environmental entity spawn** | Medium | 1 day | `playground.ts` | + +**1.1–1.2:** Replace global scans with spatial indexing. `_getBlockTypePlacements` and `_combineVoxelStates` should only consider chunks within a radius (e.g. 4–5 chunks) of any player. Add a spatial index (e.g. chunk key → block placements) and only merge voxel state for nearby chunks. + +### Phase 2: Main Thread Freedom (2–3 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 2.1 | **Async persistChunk** | Medium | 1–2 days | `PersistenceChunkProvider.ts`, `RegionFileFormat.ts` | +| 2.2 | **Worker terrain gen verification** | — | 0.5 day | `TerrainWorkerPool.ts`, `ProceduralChunkProvider.ts` | +| 2.3 | **Incremental voxel collider updates** | High | 3–5 days | `ChunkLattice.ts` | +| 2.4 | **Chunk send pacing** | Medium | 1–2 days | `NetworkSynchronizer.ts` | + +**2.1:** `persistChunk` currently calls `writeChunk` (sync). Move to async; queue writes and process in background. + +**2.3:** Add blocks to voxel colliders in batches (e.g. 256–512/tick) instead of full chunk. Use Rapier voxel API if it supports incremental updates. + +### Phase 3: Network & Sync (2–3 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 3.1 | **Entity delta/compression** | High | 5–7 days | `NetworkSynchronizer.ts`, `Serializer.ts`, protocol | +| 3.2 | **Chunk delta updates** | Medium | 3–4 days | `NetworkSynchronizer.ts`, `ChunkLattice` | +| 3.3 | **Predictive chunk preload** | Medium | 2–3 days | `playground.ts` | + +**3.1:** Send position/rotation deltas or use quantized floats. Reference: Minecraft’s entity compression, Hytale’s QUIC usage. + +### Phase 4: Client Render Pipeline (3–4 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 4.1 | **Greedy meshing (quad merging)** | Very high | 5–7 days | `ChunkWorker.ts` | +| 4.2 | **Vertex pooling** | Medium | 2–3 days | `ChunkMeshManager.ts`, `ChunkWorker.ts` | +| 4.3 | **Occlusion culling always-on** | Medium | 2–3 days | `ChunkManager.ts`, `Renderer.ts` | +| 4.4 | **Mesh apply budget** | Low | 1 day | `ChunkManager.ts` | + +**4.1:** Implement 0fps-style greedy meshing for opaque solids. Merge adjacent same-type faces; expect 2–64× fewer vertices. References: [0fps](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/), [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher). + +### Phase 5: Long-Term & Polish (ongoing) + +| # | Task | Impact | Effort | +|---|------|--------|--------| +| 5.1 | LOD impostors for distant chunks | Medium | 2–3 weeks | +| 5.2 | Brotli for region .bin payloads | Low | 1 week | +| 5.3 | Block/face limits (safety cap) | Low | <1 day | +| 5.4 | Profiling hooks (tick, chunk, mesh) | Low | 2–3 days | + +--- + +## 5. Implementation Order + +``` +Week 1–2: Phase 1 (collider locality, scoped _combineVoxelStates, defer env spawn) +Week 3–4: Phase 2 (async persistChunk, incremental voxel, chunk send pacing) +Week 5–6: Phase 3 (entity delta, chunk delta, predictive preload) +Week 7–10: Phase 4 (greedy meshing, vertex pooling, occlusion) +Ongoing: Phase 5 +``` + +--- + +## 6. Success Metrics + +| Metric | Current (Est.) | Target | +|--------|----------------|--------| +| Lag spikes when walking | Every ~5 steps | None within preload radius | +| Server tick time (p99) | 50–200 ms | < 16 ms | +| Chunk load (blocking) | 20–100 ms | < 5 ms (async) | +| Vertices per flat chunk | ~6000 | ~200–500 (greedy) | +| Client frame time | Spikes on new chunks | Stable ~16 ms (60 fps) | +| Entity packet share | ~90% | < 50% (delta/compression) | + +--- + +## 7. Key Files Reference + +| Component | Path | +|-----------|------| +| Chunk load loop | `server/src/playground.ts` | +| Collider processing | `server/src/worlds/blocks/ChunkLattice.ts` | +| Physics simulation | `server/src/worlds/physics/Simulation.ts` | +| Mesh generation | `client/src/workers/ChunkWorker.ts` | +| Chunk sync | `server/src/networking/NetworkSynchronizer.ts` | +| Region I/O | `server/src/worlds/maps/RegionFileFormat.ts` | +| Terrain gen | `server/src/worlds/maps/TerrainGenerator.ts`, `TerrainWorkerPool.ts` | +| Procedural provider | `server/src/worlds/maps/ProceduralChunkProvider.ts` | +| Persistence provider | `server/src/worlds/maps/PersistenceChunkProvider.ts` | +| World loop | `server/src/worlds/WorldLoop.ts` | + +--- + +## 8. PR #21 Action Items + +1. **Merge PR #21** for JSON-map games (boilerplate, huntcraft, etc.). +2. **Reuse chunk cache collider patterns** in `ChunkLattice` if applicable. +3. **Later:** Consider Brotli for region payloads or a similar compression layer. + +--- + +## 9. References + +- [VOXEL_PERFORMANCE_MASTER_PLAN.md](./VOXEL_PERFORMANCE_MASTER_PLAN.md) +- [CHUNK_LOADING_ARCHITECTURE.md](./CHUNK_LOADING_ARCHITECTURE.md) +- [VOXEL_RENDERING_RESEARCH.md](./VOXEL_RENDERING_RESEARCH.md) +- [OPTIMIZATION_STRATEGY.md](./OPTIMIZATION_STRATEGY.md) +- [PR #21 – Compressed world maps](https://github.com/hytopiagg/hytopia-source/pull/21) +- [0fps Greedy Meshing](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) +- [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher) +- [Minecraft Chunk Loading (Technical Wiki)](https://techmcdocs.github.io/pages/GameMechanics/ChunkLoading/) +- [Hytale Engine Technical Deep Dive](https://hytalecharts.com/news/hytale-engine-technical-deep-dive) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_ENGINE_2026_MASTER_PLAN.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_ENGINE_2026_MASTER_PLAN.md new file mode 100644 index 00000000..c74ee120 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_ENGINE_2026_MASTER_PLAN.md @@ -0,0 +1,218 @@ +# Voxel Engine 2026: World-Class Performance Master Plan + +**Document Owner:** Head of Development +**Classification:** Engineering Roadmap +**Target:** Minecraft/Hytale-grade smoothness; browser-first, 2026-ready +**Version:** 1.0 +**Date:** March 2026 + +--- + +## Executive Summary + +Hytopia aims to deliver voxel gameplay that feels as smooth and responsive as Minecraft and Hytale, while running in the browser. The current architecture has solid foundations—async chunk loading, worker terrain generation, deferred colliders—but several bottlenecks prevent parity with industry leaders. This plan addresses those gaps with a phased, research-backed approach that delivers measurable improvements without over-engineering. + +**Key thesis:** The lag and stutter are almost entirely **software architecture** issues, not hardware. Minecraft and Hytale run smoothly on similar hardware because they use different patterns. We close the gap by adopting those patterns. + +**Target outcome:** Walk/fly through a procedural world with **no perceptible lag spikes** within the preload radius, **stable 60 FPS** on the client, and **<16 ms server tick times** (p99). + +--- + +## Part 1: Strategic Context + +### 1.1 Industry Benchmark: What “On Par” Means + +| Game | Chunk Load | Physics | Rendering | Network | Notes | +|------|------------|---------|-----------|---------|-------| +| **Minecraft Java** | Worker threads, region format | Per-chunk colliders, deferred | Greedy meshing (approximate), occlusion | Delta/delta-like entity sync | 15+ years of iteration | +| **Minecraft Bedrock** | Async pipeline, priority queue | Spatial partitioning | Meshing + LOD | Variable tick rate by distance | C++ / C#; mobile-first | +| **Hytale** | Worker pool, variable chunk sizes | Batched, spatial | Mesh culling, LOD | QUIC, lower latency | Modern engine, Flecs ECS | +| **Bloxd.io** | Browser streaming | Custom voxel physics | Face culling, vertex pooling | JS-based | Browser-only | + +**Hytopia’s position:** We are browser-bound (Node server + Web client). We can’t use C++ or multiple cores on the client, but we *can* adopt the same *concepts*: async I/O, spatial locality, greedy meshing, quantized network formats, and time-budgeted main-thread work. + +### 1.2 Gap Analysis (Prioritized) + +| Priority | Gap | Impact | Root Cause | +|----------|-----|--------|------------| +| P0 | Collider work O(world) | Tick spikes, unplayable under load | `_combineVoxelStates` scans all chunks of each block type | +| P0 | No greedy meshing | 2–64× more vertices than needed | Per-face quads, no merging | +| P1 | Entity sync volume | ~90% of packets | Full pos/rot floats, no quantization | +| P1 | Sync chunk persist | Main-thread blocking | `writeChunk` sync | +| P2 | No occlusion culling | Overdraw in caves | All loaded batches rendered | +| P2 | No distance-based entity LOD | Far entities same cost as near | Single sync rate | +| P3 | Vertex allocation churn | GC spikes on mesh updates | No pooling | + +--- + +## Part 2: Phased Roadmap + +### Phase 0: Foundation & Instrumentation (Week 1) + +**Goal:** Establish baselines and guardrails before major refactors. + +| Task | Owner | Deliverable | +|------|-------|-------------| +| Profiling hooks | Eng | Tick duration, chunk load time, collider time, mesh build time | +| Metrics dashboard | Eng | Real-time charts for key metrics | +| Block/face limits | Eng | Hard cap (e.g. 500K faces) to avoid meltdown | +| Regression suite | QA | Automated “fly-through” test, capture tick/frame times | + +**Success:** We can measure and reproduce performance issues in CI and on-device. + +--- + +### Phase 1: Collider Locality (Weeks 2–3) + +**Goal:** Remove O(world) collider scans. Physics and chunk work must scale with **visible/nearby** chunks only. + +| Task | Effort | Description | +|------|--------|-------------| +| Spatial index for block placements | 3 days | Chunk key → block placements; no global iteration | +| Scoped `_combineVoxelStates` | 2 days | Merge only chunks within N chunks of any player | +| Collider unload for distant chunks | 1 day | Remove colliders when chunk unloads; don’t keep in physics | +| Time-budget verification | 0.5 day | Ensure 8 ms cap is respected; tune if needed | + +**Files:** `ChunkLattice.ts`, `playground.ts` + +**Success:** Tick time (p99) drops from 50–200 ms to <25 ms under typical load. + +--- + +### Phase 2: Main-Thread Freedom (Weeks 4–5) + +**Goal:** No sync blocking on I/O or heavy computation on the game loop. + +| Task | Effort | Description | +|------|--------|-------------| +| Async `persistChunk` | 1.5 days | Queue writes; flush in background | +| Async provider audit | 0.5 day | Confirm `requestChunk` → `getChunkAsync` path is used | +| Incremental voxel collider updates | 4 days | Add blocks in batches (256–512/tick) instead of full chunk | +| Chunk send pacing | 1.5 days | Smooth chunk sync; avoid burst of 8 chunks in one tick | + +**Files:** `PersistenceChunkProvider.ts`, `RegionFileFormat.ts`, `ChunkLattice.ts`, `NetworkSynchronizer.ts` + +**Success:** Chunk load + persist never block tick; no “catch up” spikes. + +--- + +### Phase 3: Entity Sync Compression (Weeks 6–7) + +**Goal:** Reduce entity pos/rot from ~90% of packets to <50%, with no perceptible quality loss. + +| Task | Effort | Description | +|------|--------|-------------| +| Quantized position (1/256 block, 16-bit) | 1 day | Server sends `pq`; client decodes | +| Yaw-only rotation for players | 0.5 day | 1 float vs 4 for player avatars | +| Distance-based sync rate (30/15/5 Hz) | 1 day | Near = 30 Hz, mid = 15 Hz, far = 5 Hz | +| Quantized quaternion (smallest-three) | 2 days | For NPCs and other full-rotation entities | +| Bulk pos/rot packet (optional) | 2 days | Structure-of-arrays for unreliable updates | + +**Files:** `Serializer.ts`, `NetworkSynchronizer.ts`, `protocol/schemas/Entity.ts`, `Deserializer.ts`, `EntityManager.ts` + +**Success:** Entity sync bytes/update reduced by 50–60%; bandwidth share <50%. + +--- + +### Phase 4: Greedy Meshing (Weeks 8–10) + +**Goal:** Cut vertex count by 2–64× for typical terrain; stable 60 FPS on chunk load. + +| Task | Effort | Description | +|------|--------|-------------| +| Greedy mesh algorithm (opaque solids) | 5 days | 0fps-style sweep and merge; ref `docs/research/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md` | +| Integration with ChunkWorker | 2 days | Per-batch-type merge; transparent blocks unchanged | +| AO + lighting on merged quads | 1 day | Ensure ambient occlusion and lighting still apply | +| Benchmarks and tuning | 1 day | Measure build time vs vertex reduction | + +**Files:** `ChunkWorker.ts`, `ChunkMeshManager.ts` + +**Success:** Flat chunk: ~6000 vertices → ~200–500; frame time stable on new chunk load. + +--- + +### Phase 5: Render Pipeline Polish (Weeks 11–13) + +**Goal:** GPU efficiency and graceful degradation on low-end devices. + +| Task | Effort | Description | +|------|--------|-------------| +| Vertex pooling | 2 days | Reuse BufferGeometry/ArrayBuffers; avoid per-frame allocations | +| Occlusion culling always-on | 2 days | BFS from camera; cull hidden batches | +| Mesh apply budget | 1 day | Limit meshes applied per frame; spread load | +| Block/face limits enforcement | 0.5 day | Reduce view distance when over cap | + +**Files:** `ChunkMeshManager.ts`, `ChunkManager.ts`, `ChunkWorker.ts`, `Renderer.ts` + +**Success:** No GC spikes on chunk load; overdraw reduced in cave-heavy areas. + +--- + +### Phase 6: Long-Term (Month 4+) + +| Task | Impact | Effort | +|------|--------|--------| +| LOD impostors for distant chunks | Medium | 2–3 weeks | +| Brotli (or similar) for region payloads | Low | 1 week | +| Predictive chunk preload | Medium | 1 week | +| Client-side entity prediction | Medium (latency) | 2+ weeks | + +--- + +## Part 3: Research Documentation + +The following research docs support implementation and design decisions: + +| Document | Purpose | +|----------|---------| +| [MINECRAFT_ARCHITECTURE_RESEARCH.md](./research/MINECRAFT_ARCHITECTURE_RESEARCH.md) | How Minecraft structures chunk loading, colliders, and meshing | +| [GREEDY_MESHING_IMPLEMENTATION_GUIDE.md](./research/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md) | Step-by-step greedy meshing for ChunkWorker | +| [COLLIDER_ARCHITECTURE_RESEARCH.md](./research/COLLIDER_ARCHITECTURE_RESEARCH.md) | Spatial locality and incremental colliders | +| [NETWORK_PROTOCOL_2026_RESEARCH.md](./research/NETWORK_PROTOCOL_2026_RESEARCH.md) | Modern entity sync: quantization, delta, LOD | + +**Mandate:** Engineers implementing Phase 2+ work must read the relevant research doc before coding. + +--- + +## Part 4: Success Metrics + +| Metric | Baseline (Current) | Phase 3 Target | Phase 6 Target | +|--------|--------------------|----------------|----------------| +| Server tick time (p99) | 50–200 ms | <25 ms | <16 ms | +| Chunk load (blocking) | 20–100 ms | 0 (async) | 0 | +| Vertices per flat chunk | ~6000 | ~200–500 | ~200–500 | +| Entity sync % of packets | ~90% | ~60% | <50% | +| Client frame time (p99) | Spikes to 50+ ms | <25 ms | <16 ms | +| Perceived lag spikes | Every ~5 steps | None in preload | None | + +--- + +## Part 5: Risks & Mitigations + +| Risk | Mitigation | +|------|------------| +| Greedy meshing regresses build time | Time-budget; fallback to non-greedy if over budget | +| Protocol changes break old clients | Backward-compatible optional fields; version handshake | +| Collider refactor introduces physics bugs | Rigorous test: spawn, walk, mine, place; compare before/after | +| Scope creep | Phases are fixed; Phase 6 is explicitly “long-term” | + +--- + +## Part 6: Dependencies & Prerequisites + +- **PR #21 (Compressed JSON maps):** Merge for JSON-map games; not blocking procedural world. +- **TerrainWorkerPool:** Already in place; verify `getChunkAsync` is used in playground. +- **Protocol package:** Schema changes require protocol version bump; coordinate with SDK consumers. +- **Browser support:** Target evergreen browsers; no polyfills for cutting-edge APIs. + +--- + +## Part 7: Sign-Off + +This plan represents a realistic path to Minecraft/Hytale-grade smoothness for Hytopia’s procedural world. It prioritizes the highest-impact bottlenecks (colliders, greedy meshing, entity sync) and defers nice-to-haves (LOD impostors, prediction) to later phases. + +**Recommendation:** Approve and execute Phase 0–1 immediately. Re-evaluate after Phase 3 based on metrics and user feedback. + +--- + +*— Head of Development* diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_PERFORMANCE_MASTER_PLAN.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_PERFORMANCE_MASTER_PLAN.md new file mode 100644 index 00000000..9b7fdda0 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_PERFORMANCE_MASTER_PLAN.md @@ -0,0 +1,153 @@ +# Voxel Engine Performance Master Plan +## Making Hytopia as Smooth as Minecraft & Hytale + +**Problem:** Lag every ~5 steps; constant chunk rendering; engine feels clunky compared to Minecraft/Hytale. + +**Conclusion:** This is primarily a **software/codebase architecture** issue, not hardware. Minecraft and Hytale run smoothly on similar hardware because they use different architectures. The plan below addresses the gaps. + +--- + +## Part 1: Root Cause Analysis + +### Why "Every 5 Steps" Lag Happens + +| Step | What Happens | Bottleneck | +|------|--------------|------------| +| 1 | Player moves → enters new chunk/batch range | Server loads 1 chunk/tick (CHUNKS_PER_TICK=1) | +| 2 | `getOrCreateChunk` runs | **Sync** disk read or procedural gen blocks main thread | +| 3 | Chunk queued for collider | `processPendingColliderChunks(1)` – 1/tick | +| 4 | `_addChunkBlocksToColliders` | **Heavy:** 4096 blocks, voxel propagation, `_combineVoxelStates` scans ALL chunks of that block type | +| 5 | Server sends chunk to client | Network ok, but chunk sync triggers client work | +| 6 | Client receives ChunksPacket | Posts to ChunkWorker | +| 7 | ChunkWorker builds mesh | **No greedy meshing** – 1 quad per face, 64× more than optimal for flat terrain | +| 8 | Mesh sent back, added to scene | BufferGeometry creation, possible GC spike | +| 9 | Main thread applies mesh | Can cause frame hitch | + +### Current vs. Minecraft/Hytale + +| Aspect | Hytopia (Current) | Minecraft / Hytale | +|--------|-------------------|---------------------| +| Chunk load | Sync on main thread | Worker threads, async | +| File I/O | `fs.readSync`, `zlib.gunzipSync` | Async, or worker | +| Terrain gen | Sync in main thread | Worker pool | +| Collider creation | Sync, 1/tick, O(world size) | Deferred, batched, O(chunk) | +| Mesh generation | Worker ✅ | Worker ✅ | +| Greedy meshing | ❌ (1 quad/face) | ✅ (merged quads, 2–64× fewer) | +| LOD | ✅ (step 2/4) | ✅ + impostors | +| Occlusion culling | Only when over face limit | Chunk-section visibility | +| Chunk send rate | Per ADD_CHUNK event | Batched, rate-limited | + +--- + +## Part 2: Prioritized Fixes + +### Tier 1: Quick Wins (1–3 days each) + +| # | Fix | Impact | Effort | Files | +|---|-----|--------|--------|-------| +| 1 | **Increase CHUNKS_PER_TICK** to 2–3 | Fewer "catch up" spikes when moving | 5 min | `playground.ts` | +| 2 | **Time-budget collider processing** | Cap ms per tick (e.g. 8 ms), process multiple chunks if time allows | Medium | `ChunkLattice.ts`, `playground.ts` | +| 3 | **Chunk send batching** | Don’t flood client; batch chunk sync every N ms or per tick | Medium | `NetworkSynchronizer.ts` | +| 4 | **Avoid collider work for distant chunks** | Only add colliders for chunks within 2–3 chunks of player | Medium | `ChunkLattice.ts`, `playground.ts` | + +### Tier 2: High Impact (3–7 days each) + +| # | Fix | Impact | Effort | Notes | +|---|-----|--------|--------|-------| +| 5 | **Greedy meshing (quad merging)** | 2–64× fewer vertices for terrain | 3–5 days | ChunkWorker; ref 0fps, mikolalysenko/greedy-mesher | +| 6 | **Async chunk provider** | `getChunk()` returns `Promise`; no main-thread blocking | 2–3 days | PersistenceChunkProvider, ProceduralChunkProvider, ChunkLattice | +| 7 | **Worker terrain generation** | Move `generateChunk` to `worker_threads` | 2–3 days | TerrainGenerator, ProceduralChunkProvider | +| 8 | **Async file I/O** | `fs.promises`, `zlib.gunzip` async | 1–2 days | RegionFileFormat.ts | + +### Tier 3: Architectural (1–2 weeks each) + +| # | Fix | Impact | Effort | Notes | +|---|-----|--------|--------|-------| +| 9 | **Incremental colliders** | Add blocks to voxel collider in batches (e.g. 256/tick) instead of full chunk | High | Rapier voxel API; ChunkLattice | +| 10 | **Collider locality** | `_getBlockTypePlacements` and `_combineVoxelStates` should not scan entire world | High | ChunkLattice; spatial indexing | +| 11 | **Chunk preloading by prediction** | Load chunks in movement direction before player arrives | Medium | playground.ts, loadChunksAroundPlayers | +| 12 | **Vertex pooling** | Reuse BufferGeometry / ArrayBuffers to reduce allocations and GC | Medium | ChunkMeshManager, ChunkWorker | + +### Tier 4: Polish (Ongoing) + +| # | Fix | Impact | Effort | +|---|-----|--------|--------| +| 13 | **Occlusion culling always-on** | Not just when over face limit | Medium | +| 14 | **LOD impostors** | Billboard or simplified mesh for very far chunks | High | +| 15 | **Profiling hooks** | Tick time, chunk load time, mesh build time | Low | +| 16 | **Block/face limits** | Hard cap to avoid meltdown on weak devices | Low | + +--- + +## Part 3: Recommended Implementation Order + +### Phase 1: Stop the Bleeding (Week 1) + +1. **Time-budget collider processing** – Cap at 8 ms/tick; process as many chunks as fit. +2. **Increase CHUNKS_PER_TICK** to 2–3. +3. **Spatial collider culling** – Only create colliders for chunks within 2–3 chunks of any player. +4. **Chunk send batching** – Batch chunk sync; don’t send 10 chunks in one frame. + +### Phase 2: Main Thread Freedom (Week 2–3) + +5. **Async file I/O** – `fs.promises`, async decompress. +6. **Async chunk provider** – `getChunk()` returns `Promise`; ChunkLattice awaits. +7. **Worker terrain gen** – Move `generateChunk` to worker thread. + +### Phase 3: Render Pipeline (Week 4–5) + +8. **Greedy meshing** – Implement in ChunkWorker for opaque solids; merge adjacent same-type faces. +9. **Vertex pooling** – Reuse geometry buffers where possible. + +### Phase 4: Long-Term (Month 2+) + +10. **Incremental colliders** – Batched voxel updates. +11. **Collider locality** – Remove global scans. +12. **Occlusion always-on** – Reduce overdraw. + +--- + +## Part 4: Hardware vs. Software + +| Factor | Assessment | +|--------|------------| +| **Hardware** | Unlikely primary cause if Minecraft/Hytale run fine. | +| **Software** | Sync I/O, sync terrain gen, heavy collider work, no greedy meshing – all main-thread and render bottlenecks. | +| **Codebase** | Architecture is serviceable but lacks async pipeline and mesh optimization used by mature voxel engines. | + +--- + +## Part 5: Key Files + +| Component | Path | +|-----------|------| +| Chunk load loop | `server/src/playground.ts` | +| Collider processing | `server/src/worlds/blocks/ChunkLattice.ts` | +| Mesh generation | `client/src/workers/ChunkWorker.ts` | +| Chunk sync to client | `server/src/networking/NetworkSynchronizer.ts` | +| Disk I/O | `server/src/worlds/maps/RegionFileFormat.ts` | +| Terrain generation | `server/src/worlds/maps/TerrainGenerator.ts`, `ProceduralChunkProvider.ts` | +| Client chunk handling | `client/src/chunks/ChunkManager.ts` | + +--- + +## Part 6: Success Metrics + +| Metric | Current (Est.) | Target | +|--------|----------------|--------| +| Lag spikes when walking | Every ~5 steps | None within preload radius | +| Tick time (p99) | 50–200 ms | < 16 ms | +| Chunk load time | 20–100 ms (blocking) | < 5 ms (async) | +| Vertices per chunk (flat) | ~6000 (no greedy) | ~200–500 (greedy) | +| Frame time (client) | Spikes on new chunks | Stable 16 ms (60 fps) | + +--- + +## References + +- `docs/CHUNK_LOADING_ARCHITECTURE.md` +- `docs/VOXEL_RENDERING_RESEARCH.md` +- `docs/OPTIMIZATION_STRATEGY.md` +- [0fps Greedy Meshing](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) +- [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher) +- Hytale engine deep dive: variable chunks, LOD, mesh optimization diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_RENDERING_RESEARCH.md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_RENDERING_RESEARCH.md new file mode 100644 index 00000000..5be2a42d --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/VOXEL_RENDERING_RESEARCH.md @@ -0,0 +1,190 @@ +# Voxel World Smoothness: Research on Minecraft, Hytale, and bloxd + +Deep research into how popular voxel games keep worlds lag-free and smooth during flight/movement. + +--- + +## Summary: What These Games Do + +| Technique | Minecraft | Hytale | bloxd | Hytopia (Current) | +|-----------|-----------|--------|-------|-------------------| +| **Face culling** | ✅ | ✅ | ✅ | ✅ (ChunkWorker) | +| **Greedy meshing** | ✅ (approximation) | ✅ | ✅ | ❌ | +| **Chunk batching** | ✅ (16×16×16) | Variable sizes | ✅ | ✅ (2×2×2 batches) | +| **Async mesh generation** | ✅ (worker) | ✅ | ✅ | ✅ (ChunkWorker) | +| **View distance** | ✅ | ✅ | ✅ | ✅ | +| **LOD (distant simplification)** | ✅ | ✅ | ✅ | ❌ | +| **Occlusion / cave culling** | ✅ (advanced) | Partial | Partial | ❌ | +| **Vertex pooling** | — | — | ✅ | ❌ | +| **Block/face limits** | Implicit | — | — | ❌ | + +--- + +## 1. Face Culling (Already Implemented ✅) + +**What it does:** Only render faces that are visible—i.e. faces where the adjacent block is empty or transparent. Interior faces between solid blocks are never drawn. + +**0fps comparison:** On a solid 8×8×8 cube: +- Stupid method: 3,072 quads (6 per block) +- Culling: 384 quads (1 per surface face) +- **~8× reduction** + +**Hytopia status:** Already in `ChunkWorker.ts` (lines 962–985). Neighbor check per face; solid opaque neighbors → face is culled. **No change needed.** + +--- + +## 2. Greedy Meshing / Greedy Quad Merging (Not Implemented ❌) + +**What it does:** Merge adjacent faces with the same texture/material into larger quads. Instead of many small quads, you get fewer large quads covering the same surface. + +**0fps example:** Same 8×8×8 solid cube: +- Culling: 384 quads +- Greedy: **6 quads** (one per side) +- **64× reduction over culling** + +**Algorithm (0fps):** +1. Sweep the 3D volume in 3 directions (X, Y, Z) +2. For each 2D slice, identify visible faces +3. Greedily merge adjacent same-type faces into rectangles +4. Order: top-to-bottom, left-to-right; pick the lexicographically minimal mesh + +**Multiple block types:** Group by (block type, normal direction). Mesh each group separately. + +**Performance trade-off:** +- Greedy is slower to *build* than culling (more passes, more logic) +- But produces far fewer vertices → faster rendering and less GPU memory +- Modern bottleneck is often CPU→GPU transfer; fewer vertices = less data = smoother + +**Hytopia status:** ChunkWorker emits one quad per visible face. No merging. + +**Recommendation:** High impact. Implement greedy meshing in ChunkWorker for opaque solid blocks first. Reference: [0fps greedy meshing](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/), [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher). + +--- + +## 3. Occlusion / Cave Culling (Not Implemented ❌) + +**What it does:** Don’t render chunks (or chunk sections) that are completely hidden behind solid terrain. E.g. caves behind a mountain. + +**Minecraft (Tommo’s Advanced Cave Culling, 2014):** +- Works on 16×16×16 chunk sections +- Builds a connectivity graph of transparent/air paths +- BFS from camera to find visible sections +- Culls sections unreachable through air/transparent blocks +- ~14% frame time improvement + +**Hytopia status:** No occlusion culling. All loaded chunks in view distance are rendered if in frustum. + +**Recommendation:** Medium impact, higher complexity. Consider chunk-section visibility BFS. Less urgent than greedy meshing. + +--- + +## 4. Level of Detail (LOD) (Not Implemented ❌) + +**What it does:** Render distant chunks with simpler geometry—fewer quads, lower resolution, or simplified shapes. + +**Hytale:** Variable chunk sizes; LOD where distant chunks use lower-detail meshes. + +**Typical approach:** +- Near: Full detail +- Mid: Merged/simplified mesh +- Far: Very low poly or impostors + +**Hytopia status:** No LOD. All chunks use the same mesh quality. + +**Recommendation:** Medium impact. Could start with “skip every other block” or similar for distant batches. More complex: proper LOD meshes. + +--- + +## 5. Async Mesh Generation (Already Implemented ✅) + +**What it does:** Build chunk meshes in a worker thread so the main thread stays responsive. + +**Hytopia status:** `ChunkWorker.ts` runs in a Web Worker. Mesh building is off the main thread. **Already good.** + +--- + +## 6. Block / Face Limits + +**What it does:** Cap total blocks or faces to avoid overload. E.g. stop loading chunks if face count exceeds a threshold. + +**Hytopia status:** No hard limit. Chunk count is bounded by view distance, but no per-frame or total face limit. + +**Recommendation:** Low priority. Could add a safety cap (e.g. max 500K faces) to avoid extreme lag on weak devices. + +--- + +## 7. Vertex Pooling (bloxd / High-Performance Engines) + +**What it does:** Reuse vertex buffers instead of allocating new ones per chunk. Reduces allocations and GC. + +**Impact:** Can improve frame times by tens of percent in allocation-heavy setups. + +**Hytopia status:** New geometry per batch. No pooling. + +**Recommendation:** Lower priority. Consider if profiling shows allocation/GC as a bottleneck. + +--- + +## 8. Server-Side Optimizations (Already Addressed) + +- **View distance:** Reduced default, `/view` command +- **Chunk load/unload:** With grace period +- **Prioritize by view direction:** Load chunks in front first +- **Unload distant chunks:** Keeps memory bounded + +--- + +## Prioritized Implementation Plan + +| Priority | Technique | Impact | Complexity | Effort | +|----------|-----------|--------|------------|--------| +| 1 | **Greedy meshing** | High | Medium | 2–3 days | +| 2 | **LOD for distant chunks** | Medium | Medium | 1–2 days | +| 3 | **Occlusion / cave culling** | Medium | High | 3+ days | +| 4 | **Block/face limit cap** | Low (safety) | Low | <1 day | +| 5 | **Vertex pooling** | Low–Medium | Medium | 1–2 days | + +--- + +## Greedy Meshing Implementation Sketch + +For `ChunkWorker._createChunkBatchGeometries`: + +1. **Current flow:** Per block → per face → if visible → emit quad. +2. **New flow (opaque solids):** + - Collect visible faces with (normal, blockTypeId, textureUri, AO, light) as keys + - For each direction (±X, ±Y, ±Z), build a 2D grid of visible faces + - Run greedy merge per slice (0fps algorithm) + - Emit merged quads instead of per-face quads +3. **Transparent blocks:** Can stay as-is (per-face) or use a separate greedy pass with transparency grouping. +4. **Trimesh blocks:** Keep current logic (no greedy). + +**References:** +- [0fps Part 1](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) +- [0fps Part 2 (multiple types)](https://0fps.net/2012/07/07/meshing-minecraft-part-2/) +- [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher) (JS) +- [Vercidium greedy voxel meshing gist](https://gist.github.com/Vercidium/a3002bd083cce2bc854c9ff8f0118d33) + +--- + +## Other Considerations + +- **Runs-based meshing:** Alternative to full greedy; ~20% more triangles but ~4× faster build. Good compromise. +- **GPU-driven rendering:** Modern engines use compute shaders for mesh generation. WebGL limits this; workers are the main option. +- **Chunk size:** Hytopia uses 16³ chunks and 2×2×2 batches (32³). Matches common practice. + +--- + +## Implemented (Hytopia) + +- **LOD:** Distant chunks use step 2 or 4 (half/quarter detail). Underground batches get +1 LOD. +- **Block/face limits:** When total faces > 800K, view distance shrinks to 25% and occlusion runs. +- **Vertex pooling:** Mesh updates reuse existing BufferAttributes when size matches (avoids GPU realloc). +- **Occlusion culling:** BFS from camera through air/liquid; only visible batches rendered when over face limit. +- **Underground LOD:** Batches below Y=40 use one extra LOD step (reduces cave geometry; partial greedy benefit). + +## Conclusion + +The largest missing optimization is **full greedy meshing** (quad merging). Face culling is in place, but merging adjacent same-type faces into larger quads can cut vertex/quad count by roughly 2–10× depending on geometry, which directly reduces GPU work and often improves smoothness when flying. + +LOD and occlusion culling are useful next steps; block limits and vertex pooling are refinements for later. diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/ENTITY_SYNC_DELTA_COMPRESSION_DESIGN (1).md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/ENTITY_SYNC_DELTA_COMPRESSION_DESIGN (1).md new file mode 100644 index 00000000..c689a4de --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/ENTITY_SYNC_DELTA_COMPRESSION_DESIGN (1).md @@ -0,0 +1,226 @@ +# Entity Sync: Delta / Compression Design + +**Goal:** Reduce entity position/rotation packet size and bandwidth (currently ~90% of all packets) by replacing full pos/rot with delta or compressed formats. + +--- + +## 1. Current State + +### Flow +- **Server:** Every tick, `entityManager.checkAndEmitUpdates()` runs; each entity calls `checkAndEmitUpdates()`. +- **Entity:** Emits `UPDATE_POSITION` or `UPDATE_ROTATION` when change exceeds threshold: + - **Position:** `ENTITY_POSITION_UPDATE_THRESHOLD_SQ = 0.04²` (0.04 block) + - **Rotation:** `ENTITY_ROTATION_UPDATE_THRESHOLD = cos(3°/2)` (~3°) + - **Player:** Looser position threshold `0.1²` blocks +- **NetworkSynchronizer:** Queues `{ i: id, p: [x,y,z] }` and/or `{ i: id, r: [x,y,z,w] }`. +- **Every 2 ticks (30 Hz):** Splits into reliable vs unreliable; pos/rot-only goes to **unreliable** channel. +- **Serializer:** `serializeVector` → `[x, y, z]`, `serializeQuaternion` → `[x, y, z, w]` (full floats). +- **Transport:** msgpackr with `useFloat32: FLOAT32_OPTIONS.ALWAYS` → 4 bytes per float. + +### Per-Entity Packet Size (approx) +| Format | Bytes (msgpack) | +|--------|-----------------| +| `{ i, p }` pos-only | ~25–35 | +| `{ i, r }` rot-only | ~30–40 | +| `{ i, p, r }` both | ~50–65 | +| 10 entities, pos+rot | ~500–650 | + +With 20 entities at 30 Hz: **~15–20 KB/s** for entity sync alone. + +--- + +## 2. Options for Delta / Compression + +### Option A: Quantized Position (Fixed-Point) + +**Idea:** Encode position as integers. 1 unit = 1/256 block → 0.004 block precision. + +- Range ±32768 blocks → 16-bit signed per axis. +- 3 × 2 bytes = **6 bytes** vs 3 × 4 = 12 bytes (float32). +- **~50% smaller** for position. + +**Implementation:** +```ts +// Server +const QUANT = 256; +p: [Math.round(x * QUANT), Math.round(y * QUANT), Math.round(z * QUANT)] + +// Client +position.x = p[0] / QUANT; // etc. +``` + +**Trade-off:** Precision ~0.004 block. For player/NPC movement this is fine. For very small objects, may need higher quant (e.g. 1024). + +--- + +### Option B: Quantized Quaternion (Smallest-Three) + +**Idea:** Unit quaternion has `q.x² + q.y² + q.z² + q.w² = 1`. Store the 3 components with largest magnitude; reconstruct 4th. + +- 3 × 2 bytes (quantized) = **6 bytes** vs 4 × 4 = 16 bytes. +- **~62% smaller** for rotation. + +**Implementation:** Standard "smallest three" quaternion compression (e.g. [RigidBodyDynamics](https://github.com/gameworks-builder/rigid-body-dynamics) style). Needs protocol change to support packed format. + +--- + +### Option C: Yaw-Only for Player Rotation + +**Idea:** Many entities (players, NPCs) only rotate around Y. Send 1 float (yaw) instead of 4. + +- **4 bytes** vs 16 bytes. +- **75% smaller** for rotation when applicable. + +**Caveat:** Doesn't work for entities with pitch/roll (e.g. flying, vehicles). Use as opt-in per entity type. + +--- + +### Option D: Delta Encoding (Δ from Last Sent) + +**Idea:** Send `Δp = p - p_last` instead of absolute `p`. Small movements → small deltas → msgpack encodes as smaller integers. + +- No schema change; still `[dx, dy, dz]` but values typically small. +- msgpack variable-length integers: small values use 1 byte. +- **Benefit:** 20–50% smaller when movement is small. No extra state on client if server tracks last-sent. + +**Implementation:** Server stores `_lastSentPosition` per entity per player (or broadcast). Send delta; client adds to last known position. Requires client to track "last applied" position. + +--- + +### Option E: Bulk / AoS Format + +**Idea:** Instead of `[{i:1,p:[x,y,z]},{i:2,p:[x,y,z]},...]` use structure of arrays: + +```ts +{ ids: [1,2,3], p: [[x,y,z],[x,y,z],[x,y,z]] } +``` + +- Avoids repeating keys `i`, `p` for every entity (msgpack dedup helps but structure still has overhead). +- **Benefit:** ~15–25% smaller from less map/array framing. + +**Caveat:** Requires new packet schema and client deserializer changes. All-or-nothing; can't mix with current EntitySchema in same packet. + +--- + +### Option F: Distance-Based Sync Rate + +**Idea:** Sync nearby entities at 30 Hz, distant at 10 Hz or 5 Hz. + +- **Benefit:** Fewer packets for far entities; natural LOD. +- **Implementation:** In `checkAndEmitUpdates` or NetworkSynchronizer, track distance from each player; only queue updates for entity if `tick % rateDivisor === 0` based on distance band. + +--- + +## 3. Recommended Approach + +### Phase 1: Low-Risk Wins (1–2 days each) + +| # | Change | Impact | Effort | +|---|--------|--------|--------| +| 1 | **Quantized position** (1/256 block) | ~50% smaller pos | 1 day | +| 2 | **Distance-based sync rate** (30/15/5 Hz bands) | Fewer far-entity updates | 1 day | +| 3 | **Yaw-only rotation** for player entities | ~75% smaller rot for players | 0.5 day | + +### Phase 2: Schema Changes (3–5 days) + +| # | Change | Impact | Effort | +|---|--------|--------|--------| +| 4 | **Quantized quaternion** (smallest-three) | ~62% smaller rot | 2–3 days | +| 5 | **Bulk entity update packet** | ~15–25% smaller framing | 2 days | + +### Phase 3: Advanced (Optional) + +| # | Change | Impact | Effort | +|---|--------|--------|--------| +| 6 | **Delta encoding** | Additional 20–50% when movement small | 2–3 days | +| 7 | **Client-side prediction** | Reduce perceived latency, fewer corrections | 1+ week | + +--- + +## 4. Protocol Changes Required + +### Option 1: Extend EntitySchema (Backwards Compatible) + +Add optional compressed fields; client detects and uses when present: + +```ts +// New optional fields +EntitySchema = { + i: number; + p?: VectorSchema; // existing: [x,y,z] float + r?: QuaternionSchema; // existing: [x,y,z,w] float + pq?: [number,number,number]; // quantized position (1/256 block) + rq?: [number,number,number]; // quantized quaternion (smallest-three) + ry?: number; // yaw only (radians) + // ... +} +``` + +- Server sends `pq` instead of `p` when quantized format enabled. +- Client checks `pq` first, falls back to `p`. +- Old clients ignore `pq`; new clients prefer `pq` when present. + +### Option 2: New Packet Type + +Add `EntityPosRotBulkPacket`: + +```ts +{ + ids: number[], + positions?: Int16Array | number[][], // quantized + rotations?: number[][] | Int16Array[] // quantized or yaw-only +} +``` + +- Used only for unreliable pos/rot updates. +- Existing `EntitiesPacket` still used for spawn/reliable updates. + +--- + +## 5. Key Files + +| Component | Path | +|-----------|------| +| Entity update emission | `server/src/worlds/entities/Entity.ts` (checkAndEmitUpdates) | +| Player threshold | `server/src/worlds/entities/PlayerEntity.ts` | +| Network sync queue | `server/src/networking/NetworkSynchronizer.ts` | +| Serializer | `server/src/networking/Serializer.ts` | +| Protocol schema | `protocol/schemas/Entity.ts` | +| Client deserializer | `client/src/network/Deserializer.ts` | +| Client entity update | `client/src/entities/EntityManager.ts` (_updateEntity) | +| Transport | `server/src/networking/Connection.ts`, `client/.../NetworkManager.ts` | + +--- + +## 6. Quantization Constants (Suggested) + +```ts +// Position: 1/256 block = 0.0039 block precision +const POSITION_QUANT = 256; + +// Position range: ±32768 blocks (16-bit signed) +// Covers ~1km in each direction +const POSITION_MAX = 32767; +const POSITION_MIN = -32768; + +// Quaternion: 16-bit per component, range [-1, 1] → 1/32767 precision +const QUATERNION_QUANT = 32767; +``` + +--- + +## 7. Success Metrics + +| Metric | Current | Target (Phase 1) | Target (Phase 2) | +|--------|---------|------------------|------------------| +| Entity bytes/update (10 entities) | ~500–650 | ~300–400 | ~200–280 | +| Entity sync % of total packets | ~90% | ~70% | ~50% | +| Bandwidth (20 entities, 30 Hz) | ~15–20 KB/s | ~8–12 KB/s | ~5–8 KB/s | + +--- + +## 8. References + +- [Quaternion Compression (smallest three)](http://gafferongames.com/networked-physics/snapshot-compression/) +- [Minecraft entity sync (delta/quantization)](https://wiki.vg/Protocol#Entity_Metadata) +- Current codebase: `Entity.ts` (checkAndEmitUpdates), `NetworkSynchronizer.ts` (entity sync split), `Serializer.ts` (serializeVector/Quaternion) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/MAP_ENGINE_ARCHITECTURE (1).md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/MAP_ENGINE_ARCHITECTURE (1).md new file mode 100644 index 00000000..f58f543a --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/MAP_ENGINE_ARCHITECTURE (1).md @@ -0,0 +1,272 @@ +# Hytopia Map Engine Architecture + +This document describes how the Hytopia map engine is set up, its data flow, and a roadmap for adapting it to support **binary maps** for extremely large worlds (e.g., 100k×100k×64 blocks). + +--- + +## 1. Architecture Overview + +The map engine spans **server** (authoritative block state), **client** (rendering, meshing), and **protocol** (network serialization). Maps are loaded once at world initialization and populate a chunk-based block lattice. + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ MAP LOAD PIPELINE │ +├─────────────────────────────────────────────────────────────────────────┤ +│ │ +│ JSON Map File World.loadMap() ChunkLattice │ +│ (blockTypes, blocks, ───────────────► initializeBlockEntries() │ +│ entities) │ │ │ +│ │ │ ▼ │ +│ │ │ ChunkLattice clears, │ +│ │ │ creates Chunks, │ +│ │ │ builds colliders │ +│ │ │ │ │ +│ │ ▼ ▼ │ +│ │ BlockTypeRegistry Map │ +│ │ (block types) (sparse chunks) │ +│ │ │ │ +│ │ ▼ │ +│ │ NetworkSynchronizer │ +│ │ (chunk sync to │ +│ │ clients) │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 2. WorldMap Interface (JSON Format) + +Maps conform to the `WorldMap` interface used by `World.loadMap()`: + +| Section | Purpose | Location | +|---------------|--------------------------------------------------------|-------------------------------| +| `blockTypes` | Block type definitions (id, name, textureUri, etc.) | `server/src/worlds/World.ts` | +| `blocks` | Block placements keyed by `"x,y,z"` string | `WorldMap.blocks` | +| `entities` | Entity spawns keyed by `"x,y,z"` position | `WorldMap.entities` | + +### Block Format in JSON + +Each block entry is either: + +- **Short form:** `"x,y,z": ` (e.g. `"-25,0,-16": 7`) +- **Extended form:** `"x,y,z": { "i": , "r": }` + +Coordinates are **world block coordinates** (integers). Block type IDs are 0–255 (0 = air, 1–255 = registered block types). + +### Size Implications of JSON Maps + +| Factor | Impact | +|---------------------------|-----------------------------------------------------------------------| +| Sparse object keys | Each block = `"x,y,z"` string key (10–20+ chars) + JSON overhead | +| No chunk-level batching | All blocks listed individually; no spatial grouping | +| Parsing cost | Full JSON parse loads entire map into memory before processing | +| File size | `boilerplate-small.json` ≈ 4,600+ lines; `big-world` ≈ 309,000+ lines | + +For a **100k×100k×64** fully dense map: + +- Blocks: 640 billion +- JSON would be impractically huge (hundreds of GB+ as text) +- Even sparse terrain would produce multi-GB JSON for large worlds + +--- + +## 3. Chunk Model + +### Chunk Dimensions + +| Constant | Value | Location | +|----------------|-------|--------------------------------------| +| `CHUNK_SIZE` | 16 | `server/src/worlds/blocks/Chunk.ts` | +| `CHUNK_VOLUME` | 4096 | 16³ blocks per chunk | +| `MAX_BLOCK_TYPE_ID` | 255 | `Chunk.ts` | + +Chunk origins are multiples of 16 on each axis (e.g. `(0,0,0)`, `(16,0,0)`, `(0,16,0)`). + +### Chunk Storage + +- **`Chunk._blocks`:** `Uint8Array(4096)` – block type ID per voxel +- **`Chunk._blockRotations`:** `Map` – sparse map of block index → rotation +- **Block index:** `x + (y << 4) + (z << 8)` (local coords 0–15) + +Chunks are stored in `ChunkLattice._chunks` as `Map` keyed by packed chunk origin: + +```typescript +// ChunkLattice._packCoordinate() – 54 bits per axis +chunkKey = (x << 108) | (y << 54) | z +``` + +--- + +## 4. Load Flow: `World.loadMap()` + +```typescript +// server/src/worlds/World.ts +public loadMap(map: WorldMap) { + this.chunkLattice.clear(); + + // 1. Register block types + if (map.blockTypes) { + for (const blockTypeData of map.blockTypes) { + this.blockTypeRegistry.registerGenericBlockType({ ... }); + } + } + + // 2. Iterate blocks as generator, feed to ChunkLattice + if (map.blocks) { + const blockEntries = function* () { + for (const key in mapBlocks) { + const blockValue = mapBlocks[key]; + const blockTypeId = typeof blockValue === 'number' ? blockValue : blockValue.i; + const blockRotationIndex = typeof blockValue === 'number' ? undefined : blockValue.r; + const [x, y, z] = key.split(',').map(Number); + yield { globalCoordinate: { x, y, z }, blockTypeId, blockRotation }; + } + }; + this.chunkLattice.initializeBlockEntries(blockEntries()); + } + + // 3. Spawn entities + if (map.entities) { ... } +} +``` + +### `ChunkLattice.initializeBlockEntries()` + +- Clears the lattice +- For each block: resolves chunk, creates chunk if needed, calls `chunk.setBlock()` +- Tracks block placements per type for colliders +- After all blocks: builds one collider per block type (voxel or trimesh) + +--- + +## 5. Client-Server Chunk Sync + +Chunks are serialized and sent to clients via `NetworkSynchronizer`: + +| Protocol Field | Description | +|----------------|--------------------------------------| +| `c` | Chunk origin `[x, y, z]` | +| `b` | Block IDs `Uint8Array \| number[]` (4096) | +| `r` | Rotations: flat `[blockIndex, rotIndex, ...]` | +| `rm` | Chunk removed flag | + +- **Serializer:** `Serializer.serializeChunk()` → `protocol.ChunkSchema` +- **Client:** `Deserializer.deserializeChunk()` → `DeserializedChunk` +- **ChunkWorker:** Receives `chunk_update`, registers chunk, builds meshes + +The client does **not** load the JSON map. It receives chunks from the server over the network after a player joins a world. + +--- + +## 6. Key Files Reference + +| Component | Path | +|----------------------|--------------------------------------------------| +| WorldMap interface | `server/src/worlds/World.ts` | +| loadMap | `server/src/worlds/World.ts` | +| ChunkLattice | `server/src/worlds/blocks/ChunkLattice.ts` | +| Chunk | `server/src/worlds/blocks/Chunk.ts` | +| ChunkSchema (proto) | `protocol/schemas/Chunk.ts` | +| Serializer | `server/src/networking/Serializer.ts` | +| ChunkWorker (client) | `client/src/workers/ChunkWorker.ts` | +| Deserializer | `client/src/network/Deserializer.ts` | + +--- + +## 7. Binary Map Adaptation Roadmap for 100k×100k×64 + +To support huge maps efficiently, the engine should move from JSON to **binary map sources** with **chunk-level loading** and **streaming**. + +### 7.1 Binary Chunk Format (Proposed) + +Store one file or region per chunk (or region of chunks): + +``` +chunk.{cx}.{cy}.{cz}.bin OR region.{rx}.{ry}.{rz}.bin +``` + +**Suggested layout per chunk (raw):** + +| Offset | Size | Content | +|--------|--------|------------------------------------------| +| 0 | 12 | Origin (3× int32: x, y, z) | +| 12 | 4096 | Block IDs (Uint8Array) | +| 4108 | var | Sparse rotations: count + [idx, rot]... | + +Or use a compact format (e.g. run-length encoding for air, or palette indices) for sparse chunks. + +### 7.2 Streaming / Lazy Loading + +- **Do not** load the entire map into memory. +- Use a **chunk provider** that: + - Accepts `(chunkOriginX, chunkOriginY, chunkOriginZ)` and returns chunk data + - Reads from binary files, memory-mapped files, or a database +- Replace the current `loadMap()` bulk load with: + - Initial load of a small seed area (e.g. spawn region) + - On-demand loading when `ChunkLattice.getOrCreateChunk()` needs a chunk not yet in memory + +### 7.3 Implementation Strategy + +1. **`MapProvider` interface** + ```typescript + interface MapProvider { + getChunk(origin: Vector3Like): ChunkData | null | Promise; + getBlockTypes(): BlockTypeOptions[]; + } + ``` + +2. **`BinaryMapProvider`** + - Reads `.bin` chunk files from disk or object storage + - Maps chunk origin → file path or byte range + - Returns `{ blocks: Uint8Array, rotations: Map }` + +3. **ChunkLattice changes** + - Replace `initializeBlockEntries()` full load with lazy `getOrCreateChunk()` that: + - Checks `_chunks` cache + - If miss: calls `MapProvider.getChunk()`, creates `Chunk`, inserts into `_chunks` + - Optionally preload chunks in a radius around player(s) + +4. **Block types** + - Keep block types in a small JSON or separate binary; they are tiny compared to block data. + - Load once at startup; no need to stream. + +### 7.4 Scale Estimates for 100k×100k×64 + +| Metric | Value | +|---------------------------|--------------------------| +| World dimensions | 100,000 × 100,000 × 64 | +| Chunks (16³) | 6,250 × 6,250 × 4 ≈ 156M chunks | +| Bytes per chunk (raw) | ~4.1 KB (blocks only) | +| Raw block data (if dense) | ~640 GB | +| Sparse (e.g. surface) | Much less; only store non-air chunks | + +Binary format advantages: + +- No JSON parsing; direct `Uint8Array` use +- Chunk-level I/O; load only what’s needed +- Possible memory-mapping for large files +- Optional compression (e.g. LZ4, Zstd) per chunk or region + +### 7.5 Migration Path + +1. **Phase 1:** Add `BinaryMapProvider` that reads chunk `.bin` files; `loadMap()` can accept `WorldMap | MapProvider`. +2. **Phase 2:** Make `ChunkLattice.getOrCreateChunk()` use the provider when a chunk is missing. +3. **Phase 3:** Add tooling to convert existing JSON maps → binary chunk files. +4. **Phase 4:** Optional region/compression format for production. + +--- + +## 8. Summary + +| Current (JSON) | Target (Binary + Streaming) | +|----------------------------|----------------------------------| +| Full map in memory | Chunk-level loading | +| Single large JSON parse | Small reads per chunk | +| Sparse object keys | Dense `Uint8Array` per chunk | +| Not viable for 100k³ scale | Designed for huge worlds | + +The existing `Chunk` and `ChunkLattice` design already matches a chunk-oriented model. The main changes are: + +1. Replace JSON as the map source with a binary chunk provider. +2. Add lazy loading so chunks are fetched on demand. +3. Provide conversion tools and a clear binary chunk layout. diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/SMOOTH_WORLD_STREAMING_REFACTOR_PLAN (1).md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/SMOOTH_WORLD_STREAMING_REFACTOR_PLAN (1).md new file mode 100644 index 00000000..cead4812 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/SMOOTH_WORLD_STREAMING_REFACTOR_PLAN (1).md @@ -0,0 +1,180 @@ +# Smooth World Streaming Refactor Plan + +> **Canonical roadmap:** See [VOXEL_ENGINE_2026_MASTER_PLAN.md](./VOXEL_ENGINE_2026_MASTER_PLAN.md) for the full executive plan and phased roadmap. This document provides additional context and cross-references. + +**Goal:** Peak performance for the procedurally generated world—smooth streaming, no lag spikes, Minecraft/Hytale/bloxd-level polish. + +**Sources:** Codebase analysis, [VOXEL_PERFORMANCE_MASTER_PLAN.md](./VOXEL_PERFORMANCE_MASTER_PLAN.md), [CHUNK_LOADING_ARCHITECTURE.md](./CHUNK_LOADING_ARCHITECTURE.md), [VOXEL_RENDERING_RESEARCH.md](./VOXEL_RENDERING_RESEARCH.md), [PR #21](https://github.com/hytopiagg/hytopia-source/pull/21), and industry patterns from Minecraft, Hytale, and Bloxd. + +--- + +## 1. Competitive Analysis: Minecraft vs Hytale vs Bloxd vs Hytopia + +| Aspect | Minecraft | Hytale | Bloxd | Hytopia (Current) | +|--------|-----------|--------|-------|-------------------| +| **Chunk load** | Worker threads, async | Worker pool | JS async | ✅ `requestChunk` + `getChunkAsync` (TerrainWorkerPool) | +| **File I/O** | Async | Async | N/A (streaming) | ✅ `readChunkAsync` (PersistenceChunkProvider) | +| **Terrain gen** | Worker threads | Worker pool | — | ✅ `generateChunkAsync` (TerrainWorkerPool) | +| **Physics colliders** | Deferred, O(chunk) | Batched, spatial | Custom voxel | ❌ Sync, O(world) via `_combineVoxelStates` | +| **Collider locality** | Per-chunk, near player | Spatial culling | — | ⚠️ Partial (COLLIDER_MAX_CHUNK_DISTANCE=3) | +| **Greedy meshing** | ✅ | ✅ (mesh culling) | ✅ | ❌ 1 quad/face, ~64× extra geometry | +| **Chunk send rate** | Incremental, rate-limited | Batched | Streaming | ⚠️ MAX_CHUNKS_PER_SYNC=8, can burst | +| **Entity sync** | Delta / compressed | — | — | Full pos/rot 30 Hz, 90%+ of packets | +| **LOD** | ✅ | Variable chunk sizes | — | ✅ (step 2/4) | +| **Occlusion** | Cave culling | Partial | — | ⚠️ Only when over face limit | +| **Vertex pooling** | — | — | ✅ | ⚠️ Partial (size-match reuse) | +| **Map compression** | Region format | — | — | ❌ JSON maps large; PR #21 adds compression | + +**Gap summary:** Hytopia’s biggest gaps are (1) collider work O(world) and sync, (2) no greedy meshing, (3) entity sync volume, (4) JSON map size for non-procedural games. Procedural world already uses async load + worker terrain gen; collider and client-side mesh work are the main bottlenecks. + +--- + +## 2. PR #21 Relevance to Procedural World + +[PR #21: Compressed world maps](https://github.com/hytopiagg/hytopia-source/pull/21) targets **JSON maps** (`loadMap(map.json)`), not procedural/region worlds. It adds: + +| Feature | Applies to Procedural? | Notes | +|---------|------------------------|-------| +| `map.compressed.json` | ❌ | JSON map format only | +| `map.chunks.bin` (chunk cache) | ❌ | Prebaked JSON map chunks | +| Chunk cache collider build | ⚠️ Partially | “perf: speed up chunk cache collider build” can inform collider design | +| Brotli compression | ❌ | For map JSON, not region .bin | +| Auto-detect / `hytopia map-compress` | ❌ | JSON map workflow | + +**Recommendation:** Merge PR #21 for JSON-map games (huntcraft, boilerplate, etc.). For procedural world, reuse the collider build approach where relevant. Procedural persistence uses region `.bin`; consider Brotli for region payloads later. + +--- + +## 3. Root Cause Summary + +When a player joins and blocks have physics: + +1. **Physics step (60 Hz):** Rapier steps the entire world, including all block colliders + player rigid body. +2. **Collider creation:** `_addChunkBlocksToColliders` → `_combineVoxelStates` scans all chunks of each block type (O(world)). +3. **Entity sync (30 Hz):** Full position/rotation for entities/players every 2 ticks; dominates packet volume. +4. **Chunk sync:** Up to 8 chunks per sync; client mesh build can spike main thread. +5. **Client mesh:** No greedy meshing → 2–64× more vertices than needed. +6. **ADD_CHUNK events:** Environmental entity spawn per chunk runs synchronously. + +--- + +## 4. Refactoring Plan (Prioritized) + +### Phase 1: Stop the Bleeding (1–2 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 1.1 | **Collider locality – spatial index** | High | 3–5 days | `ChunkLattice.ts` | +| 1.2 | **Scoped `_combineVoxelStates`** | High | 2–3 days | `ChunkLattice.ts` | +| 1.3 | **Time-budget collider processing** | Medium | ✅ Done | `playground.ts` | +| 1.4 | **CHUNKS_PER_TICK = 3** | ✅ Done | — | `playground.ts` | +| 1.5 | **Defer environmental entity spawn** | Medium | 1 day | `playground.ts` | + +**1.1–1.2:** Replace global scans with spatial indexing. `_getBlockTypePlacements` and `_combineVoxelStates` should only consider chunks within a radius (e.g. 4–5 chunks) of any player. Add a spatial index (e.g. chunk key → block placements) and only merge voxel state for nearby chunks. + +### Phase 2: Main Thread Freedom (2–3 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 2.1 | **Async persistChunk** | Medium | 1–2 days | `PersistenceChunkProvider.ts`, `RegionFileFormat.ts` | +| 2.2 | **Worker terrain gen verification** | — | 0.5 day | `TerrainWorkerPool.ts`, `ProceduralChunkProvider.ts` | +| 2.3 | **Incremental voxel collider updates** | High | 3–5 days | `ChunkLattice.ts` | +| 2.4 | **Chunk send pacing** | Medium | 1–2 days | `NetworkSynchronizer.ts` | + +**2.1:** `persistChunk` currently calls `writeChunk` (sync). Move to async; queue writes and process in background. + +**2.3:** Add blocks to voxel colliders in batches (e.g. 256–512/tick) instead of full chunk. Use Rapier voxel API if it supports incremental updates. + +### Phase 3: Network & Sync (2–3 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 3.1 | **Entity delta/compression** | High | 5–7 days | `NetworkSynchronizer.ts`, `Serializer.ts`, protocol | +| 3.2 | **Chunk delta updates** | Medium | 3–4 days | `NetworkSynchronizer.ts`, `ChunkLattice` | +| 3.3 | **Predictive chunk preload** | Medium | 2–3 days | `playground.ts` | + +**3.1:** Send position/rotation deltas or use quantized floats. Reference: Minecraft’s entity compression, Hytale’s QUIC usage. + +### Phase 4: Client Render Pipeline (3–4 weeks) + +| # | Task | Impact | Effort | Files | +|---|------|--------|--------|-------| +| 4.1 | **Greedy meshing (quad merging)** | Very high | 5–7 days | `ChunkWorker.ts` | +| 4.2 | **Vertex pooling** | Medium | 2–3 days | `ChunkMeshManager.ts`, `ChunkWorker.ts` | +| 4.3 | **Occlusion culling always-on** | Medium | 2–3 days | `ChunkManager.ts`, `Renderer.ts` | +| 4.4 | **Mesh apply budget** | Low | 1 day | `ChunkManager.ts` | + +**4.1:** Implement 0fps-style greedy meshing for opaque solids. Merge adjacent same-type faces; expect 2–64× fewer vertices. References: [0fps](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/), [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher). + +### Phase 5: Long-Term & Polish (ongoing) + +| # | Task | Impact | Effort | +|---|------|--------|--------| +| 5.1 | LOD impostors for distant chunks | Medium | 2–3 weeks | +| 5.2 | Brotli for region .bin payloads | Low | 1 week | +| 5.3 | Block/face limits (safety cap) | Low | <1 day | +| 5.4 | Profiling hooks (tick, chunk, mesh) | Low | 2–3 days | + +--- + +## 5. Implementation Order + +``` +Week 1–2: Phase 1 (collider locality, scoped _combineVoxelStates, defer env spawn) +Week 3–4: Phase 2 (async persistChunk, incremental voxel, chunk send pacing) +Week 5–6: Phase 3 (entity delta, chunk delta, predictive preload) +Week 7–10: Phase 4 (greedy meshing, vertex pooling, occlusion) +Ongoing: Phase 5 +``` + +--- + +## 6. Success Metrics + +| Metric | Current (Est.) | Target | +|--------|----------------|--------| +| Lag spikes when walking | Every ~5 steps | None within preload radius | +| Server tick time (p99) | 50–200 ms | < 16 ms | +| Chunk load (blocking) | 20–100 ms | < 5 ms (async) | +| Vertices per flat chunk | ~6000 | ~200–500 (greedy) | +| Client frame time | Spikes on new chunks | Stable ~16 ms (60 fps) | +| Entity packet share | ~90% | < 50% (delta/compression) | + +--- + +## 7. Key Files Reference + +| Component | Path | +|-----------|------| +| Chunk load loop | `server/src/playground.ts` | +| Collider processing | `server/src/worlds/blocks/ChunkLattice.ts` | +| Physics simulation | `server/src/worlds/physics/Simulation.ts` | +| Mesh generation | `client/src/workers/ChunkWorker.ts` | +| Chunk sync | `server/src/networking/NetworkSynchronizer.ts` | +| Region I/O | `server/src/worlds/maps/RegionFileFormat.ts` | +| Terrain gen | `server/src/worlds/maps/TerrainGenerator.ts`, `TerrainWorkerPool.ts` | +| Procedural provider | `server/src/worlds/maps/ProceduralChunkProvider.ts` | +| Persistence provider | `server/src/worlds/maps/PersistenceChunkProvider.ts` | +| World loop | `server/src/worlds/WorldLoop.ts` | + +--- + +## 8. PR #21 Action Items + +1. **Merge PR #21** for JSON-map games (boilerplate, huntcraft, etc.). +2. **Reuse chunk cache collider patterns** in `ChunkLattice` if applicable. +3. **Later:** Consider Brotli for region payloads or a similar compression layer. + +--- + +## 9. References + +- [VOXEL_PERFORMANCE_MASTER_PLAN.md](./VOXEL_PERFORMANCE_MASTER_PLAN.md) +- [CHUNK_LOADING_ARCHITECTURE.md](./CHUNK_LOADING_ARCHITECTURE.md) +- [VOXEL_RENDERING_RESEARCH.md](./VOXEL_RENDERING_RESEARCH.md) +- [OPTIMIZATION_STRATEGY.md](./OPTIMIZATION_STRATEGY.md) +- [PR #21 – Compressed world maps](https://github.com/hytopiagg/hytopia-source/pull/21) +- [0fps Greedy Meshing](https://0fps.net/2012/06/30/meshing-in-a-minecraft-game/) +- [mikolalysenko/greedy-mesher](https://github.com/mikolalysenko/greedy-mesher) +- [Minecraft Chunk Loading (Technical Wiki)](https://techmcdocs.github.io/pages/GameMechanics/ChunkLoading/) +- [Hytale Engine Technical Deep Dive](https://hytalecharts.com/news/hytale-engine-technical-deep-dive) diff --git a/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/VOXEL_ENGINE_2026_MASTER_PLAN (1).md b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/VOXEL_ENGINE_2026_MASTER_PLAN (1).md new file mode 100644 index 00000000..c74ee120 --- /dev/null +++ b/ai-memory/docs/perf-external-notes-2026-03-05/raw/duplicates/VOXEL_ENGINE_2026_MASTER_PLAN (1).md @@ -0,0 +1,218 @@ +# Voxel Engine 2026: World-Class Performance Master Plan + +**Document Owner:** Head of Development +**Classification:** Engineering Roadmap +**Target:** Minecraft/Hytale-grade smoothness; browser-first, 2026-ready +**Version:** 1.0 +**Date:** March 2026 + +--- + +## Executive Summary + +Hytopia aims to deliver voxel gameplay that feels as smooth and responsive as Minecraft and Hytale, while running in the browser. The current architecture has solid foundations—async chunk loading, worker terrain generation, deferred colliders—but several bottlenecks prevent parity with industry leaders. This plan addresses those gaps with a phased, research-backed approach that delivers measurable improvements without over-engineering. + +**Key thesis:** The lag and stutter are almost entirely **software architecture** issues, not hardware. Minecraft and Hytale run smoothly on similar hardware because they use different patterns. We close the gap by adopting those patterns. + +**Target outcome:** Walk/fly through a procedural world with **no perceptible lag spikes** within the preload radius, **stable 60 FPS** on the client, and **<16 ms server tick times** (p99). + +--- + +## Part 1: Strategic Context + +### 1.1 Industry Benchmark: What “On Par” Means + +| Game | Chunk Load | Physics | Rendering | Network | Notes | +|------|------------|---------|-----------|---------|-------| +| **Minecraft Java** | Worker threads, region format | Per-chunk colliders, deferred | Greedy meshing (approximate), occlusion | Delta/delta-like entity sync | 15+ years of iteration | +| **Minecraft Bedrock** | Async pipeline, priority queue | Spatial partitioning | Meshing + LOD | Variable tick rate by distance | C++ / C#; mobile-first | +| **Hytale** | Worker pool, variable chunk sizes | Batched, spatial | Mesh culling, LOD | QUIC, lower latency | Modern engine, Flecs ECS | +| **Bloxd.io** | Browser streaming | Custom voxel physics | Face culling, vertex pooling | JS-based | Browser-only | + +**Hytopia’s position:** We are browser-bound (Node server + Web client). We can’t use C++ or multiple cores on the client, but we *can* adopt the same *concepts*: async I/O, spatial locality, greedy meshing, quantized network formats, and time-budgeted main-thread work. + +### 1.2 Gap Analysis (Prioritized) + +| Priority | Gap | Impact | Root Cause | +|----------|-----|--------|------------| +| P0 | Collider work O(world) | Tick spikes, unplayable under load | `_combineVoxelStates` scans all chunks of each block type | +| P0 | No greedy meshing | 2–64× more vertices than needed | Per-face quads, no merging | +| P1 | Entity sync volume | ~90% of packets | Full pos/rot floats, no quantization | +| P1 | Sync chunk persist | Main-thread blocking | `writeChunk` sync | +| P2 | No occlusion culling | Overdraw in caves | All loaded batches rendered | +| P2 | No distance-based entity LOD | Far entities same cost as near | Single sync rate | +| P3 | Vertex allocation churn | GC spikes on mesh updates | No pooling | + +--- + +## Part 2: Phased Roadmap + +### Phase 0: Foundation & Instrumentation (Week 1) + +**Goal:** Establish baselines and guardrails before major refactors. + +| Task | Owner | Deliverable | +|------|-------|-------------| +| Profiling hooks | Eng | Tick duration, chunk load time, collider time, mesh build time | +| Metrics dashboard | Eng | Real-time charts for key metrics | +| Block/face limits | Eng | Hard cap (e.g. 500K faces) to avoid meltdown | +| Regression suite | QA | Automated “fly-through” test, capture tick/frame times | + +**Success:** We can measure and reproduce performance issues in CI and on-device. + +--- + +### Phase 1: Collider Locality (Weeks 2–3) + +**Goal:** Remove O(world) collider scans. Physics and chunk work must scale with **visible/nearby** chunks only. + +| Task | Effort | Description | +|------|--------|-------------| +| Spatial index for block placements | 3 days | Chunk key → block placements; no global iteration | +| Scoped `_combineVoxelStates` | 2 days | Merge only chunks within N chunks of any player | +| Collider unload for distant chunks | 1 day | Remove colliders when chunk unloads; don’t keep in physics | +| Time-budget verification | 0.5 day | Ensure 8 ms cap is respected; tune if needed | + +**Files:** `ChunkLattice.ts`, `playground.ts` + +**Success:** Tick time (p99) drops from 50–200 ms to <25 ms under typical load. + +--- + +### Phase 2: Main-Thread Freedom (Weeks 4–5) + +**Goal:** No sync blocking on I/O or heavy computation on the game loop. + +| Task | Effort | Description | +|------|--------|-------------| +| Async `persistChunk` | 1.5 days | Queue writes; flush in background | +| Async provider audit | 0.5 day | Confirm `requestChunk` → `getChunkAsync` path is used | +| Incremental voxel collider updates | 4 days | Add blocks in batches (256–512/tick) instead of full chunk | +| Chunk send pacing | 1.5 days | Smooth chunk sync; avoid burst of 8 chunks in one tick | + +**Files:** `PersistenceChunkProvider.ts`, `RegionFileFormat.ts`, `ChunkLattice.ts`, `NetworkSynchronizer.ts` + +**Success:** Chunk load + persist never block tick; no “catch up” spikes. + +--- + +### Phase 3: Entity Sync Compression (Weeks 6–7) + +**Goal:** Reduce entity pos/rot from ~90% of packets to <50%, with no perceptible quality loss. + +| Task | Effort | Description | +|------|--------|-------------| +| Quantized position (1/256 block, 16-bit) | 1 day | Server sends `pq`; client decodes | +| Yaw-only rotation for players | 0.5 day | 1 float vs 4 for player avatars | +| Distance-based sync rate (30/15/5 Hz) | 1 day | Near = 30 Hz, mid = 15 Hz, far = 5 Hz | +| Quantized quaternion (smallest-three) | 2 days | For NPCs and other full-rotation entities | +| Bulk pos/rot packet (optional) | 2 days | Structure-of-arrays for unreliable updates | + +**Files:** `Serializer.ts`, `NetworkSynchronizer.ts`, `protocol/schemas/Entity.ts`, `Deserializer.ts`, `EntityManager.ts` + +**Success:** Entity sync bytes/update reduced by 50–60%; bandwidth share <50%. + +--- + +### Phase 4: Greedy Meshing (Weeks 8–10) + +**Goal:** Cut vertex count by 2–64× for typical terrain; stable 60 FPS on chunk load. + +| Task | Effort | Description | +|------|--------|-------------| +| Greedy mesh algorithm (opaque solids) | 5 days | 0fps-style sweep and merge; ref `docs/research/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md` | +| Integration with ChunkWorker | 2 days | Per-batch-type merge; transparent blocks unchanged | +| AO + lighting on merged quads | 1 day | Ensure ambient occlusion and lighting still apply | +| Benchmarks and tuning | 1 day | Measure build time vs vertex reduction | + +**Files:** `ChunkWorker.ts`, `ChunkMeshManager.ts` + +**Success:** Flat chunk: ~6000 vertices → ~200–500; frame time stable on new chunk load. + +--- + +### Phase 5: Render Pipeline Polish (Weeks 11–13) + +**Goal:** GPU efficiency and graceful degradation on low-end devices. + +| Task | Effort | Description | +|------|--------|-------------| +| Vertex pooling | 2 days | Reuse BufferGeometry/ArrayBuffers; avoid per-frame allocations | +| Occlusion culling always-on | 2 days | BFS from camera; cull hidden batches | +| Mesh apply budget | 1 day | Limit meshes applied per frame; spread load | +| Block/face limits enforcement | 0.5 day | Reduce view distance when over cap | + +**Files:** `ChunkMeshManager.ts`, `ChunkManager.ts`, `ChunkWorker.ts`, `Renderer.ts` + +**Success:** No GC spikes on chunk load; overdraw reduced in cave-heavy areas. + +--- + +### Phase 6: Long-Term (Month 4+) + +| Task | Impact | Effort | +|------|--------|--------| +| LOD impostors for distant chunks | Medium | 2–3 weeks | +| Brotli (or similar) for region payloads | Low | 1 week | +| Predictive chunk preload | Medium | 1 week | +| Client-side entity prediction | Medium (latency) | 2+ weeks | + +--- + +## Part 3: Research Documentation + +The following research docs support implementation and design decisions: + +| Document | Purpose | +|----------|---------| +| [MINECRAFT_ARCHITECTURE_RESEARCH.md](./research/MINECRAFT_ARCHITECTURE_RESEARCH.md) | How Minecraft structures chunk loading, colliders, and meshing | +| [GREEDY_MESHING_IMPLEMENTATION_GUIDE.md](./research/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md) | Step-by-step greedy meshing for ChunkWorker | +| [COLLIDER_ARCHITECTURE_RESEARCH.md](./research/COLLIDER_ARCHITECTURE_RESEARCH.md) | Spatial locality and incremental colliders | +| [NETWORK_PROTOCOL_2026_RESEARCH.md](./research/NETWORK_PROTOCOL_2026_RESEARCH.md) | Modern entity sync: quantization, delta, LOD | + +**Mandate:** Engineers implementing Phase 2+ work must read the relevant research doc before coding. + +--- + +## Part 4: Success Metrics + +| Metric | Baseline (Current) | Phase 3 Target | Phase 6 Target | +|--------|--------------------|----------------|----------------| +| Server tick time (p99) | 50–200 ms | <25 ms | <16 ms | +| Chunk load (blocking) | 20–100 ms | 0 (async) | 0 | +| Vertices per flat chunk | ~6000 | ~200–500 | ~200–500 | +| Entity sync % of packets | ~90% | ~60% | <50% | +| Client frame time (p99) | Spikes to 50+ ms | <25 ms | <16 ms | +| Perceived lag spikes | Every ~5 steps | None in preload | None | + +--- + +## Part 5: Risks & Mitigations + +| Risk | Mitigation | +|------|------------| +| Greedy meshing regresses build time | Time-budget; fallback to non-greedy if over budget | +| Protocol changes break old clients | Backward-compatible optional fields; version handshake | +| Collider refactor introduces physics bugs | Rigorous test: spawn, walk, mine, place; compare before/after | +| Scope creep | Phases are fixed; Phase 6 is explicitly “long-term” | + +--- + +## Part 6: Dependencies & Prerequisites + +- **PR #21 (Compressed JSON maps):** Merge for JSON-map games; not blocking procedural world. +- **TerrainWorkerPool:** Already in place; verify `getChunkAsync` is used in playground. +- **Protocol package:** Schema changes require protocol version bump; coordinate with SDK consumers. +- **Browser support:** Target evergreen browsers; no polyfills for cutting-edge APIs. + +--- + +## Part 7: Sign-Off + +This plan represents a realistic path to Minecraft/Hytale-grade smoothness for Hytopia’s procedural world. It prioritizes the highest-impact bottlenecks (colliders, greedy meshing, entity sync) and defers nice-to-haves (LOD impostors, prediction) to later phases. + +**Recommendation:** Approve and execute Phase 0–1 immediately. Re-evaluate after Phase 3 based on metrics and user feedback. + +--- + +*— Head of Development* diff --git a/ai-memory/docs/perf-final-2026-03-05/FINAL.md b/ai-memory/docs/perf-final-2026-03-05/FINAL.md new file mode 100644 index 00000000..060508ae --- /dev/null +++ b/ai-memory/docs/perf-final-2026-03-05/FINAL.md @@ -0,0 +1,394 @@ +# HYTOPIA Performance (Client + Server) — Consolidated Findings (2026-03-05) + +Base code reference for all “Verified” statements in this report: + +- `origin/master` @ `24a295d` (2026-03-05) +- Repo: `web3dev1337/hytopia-source` (fork of `hytopiagg/hytopia-source`) + +This is a synthesis of: + +- **Code-verified findings** (client, server, protocol) +- **Your open performance PRs** on the fork +- **Imported third‑party notes** from Windows Downloads (`/mnt/c/Users/AB/Downloads`) captured on **2026-03-05 14:09–14:25** (local time) and stored under: + - `ai-memory/docs/perf-external-notes-2026-03-05/raw/` + - Cross-check doc: `ai-memory/docs/perf-external-notes-2026-03-05/FINDINGS.md` + +--- + +## Executive Summary (What’s Actually Hurting Performance Today) + +### P0 (highest impact, verified in code) + +1) **Server → client chunk delivery is bursty and unbounded** + - On **player join/reconnect**, the server queues **every chunk in the world** for that player (`NetworkSynchronizer._onPlayerJoinedWorld` loops `chunkLattice.getAllChunks()`), then sends them as chunk packets with **no pacing/segmentation**. + - Every chunk is serialized with `Array.from(chunk.blocks)` (4096 numbers) which is extremely allocation-heavy and inflates payload sizes. + +2) **Client chunk meshing is “per visible face”, not greedy** + - Face culling exists, but there is **no quad merging / greedy meshing**, so vertex counts are much higher than necessary in common terrain. + - This increases worker CPU, transfer sizes, main-thread mesh apply costs, GPU memory, and draw overhead. + +3) **Client network decoding can block the main thread** + - Incoming packets are synchronously gzip-decompressed (`gunzipSync`) and msgpack-decoded on the main thread. + - Large chunk packets + sync decompression/decoding are a direct path to visible stutter. + +4) **Client creates/destroys GPU geometry frequently** + - Chunk batch updates replace `BufferGeometry` objects and dispose old ones rather than updating attributes in place (no pooling/reuse). + +### P1 (medium/high impact, verified in code) + +5) **Entity sync bandwidth is larger than it needs to be** + - Entity pos/rot updates are float vectors/quaternions (float32) with no quantization or delta compression. + - The server already routes pos/rot‑only updates to the unreliable channel, which is good, but payload size is still high. + +6) **Protocol + serializer choices force avoidable copying** + - `protocol/schemas/Chunk.ts`’s AJV JSON schema only accepts `b` as `number[]` (4096 entries), so the server serializes chunk blocks via `Array.from`. + - The client then always does `new Uint8Array(chunk.b)`, which **copies** again. + +### P2 (lower impact or situational, verified in code) + +7) **View-distance culling work is O(batches) every frame** + - Each frame, the client iterates all batch IDs and computes distances to decide scene membership. + +8) **MEDIUM/LOW presets have no FPS cap; DPR is unbounded** + - Only `POWER_SAVING` has an `fpsCap` on `master`. + - Renderer pixel ratio uses `window.devicePixelRatio * resolution.multiplier` with no cap. + +--- + +## Verified Issues (with Evidence + Recommended Fixes) + +### 1) Server chunk sync: “full world on join” + no pacing (P0) + +**Evidence (verified):** + +- `server/src/networking/NetworkSynchronizer.ts` + - `_onPlayerJoinedWorld`: queues chunk sync for **all chunks**: + - `for (const chunk of this._world.chunkLattice.getAllChunks()) { ... chunk.serialize() ... }` + - `_collectSyncToOutboundPackets`: turns queued chunk syncs into one packet per sync without pacing: + - `protocol.createPacket(..., sync.valuesArray, tick)` + +**Impact:** + +- Server CPU + memory spikes on join/reconnect (serialize + validate + msgpack pack + gzip). +- Client stutters on receipt (sync gunzip + unpack + per-chunk registry + worker messages + mesh builds). +- Networking bursts can increase packet loss / HOL blocking (and increases likelihood of gzip work). + +**Fix direction:** + +- Implement **per-player chunk streaming**: + - Maintain `playerVisibleChunkSet` (or batch set) derived from player position + view distance. + - Queue only *newly visible* chunks; send removals when leaving range. +- Add **pacing/segmentation**: + - Enforce a per-player per-tick budget (chunks, bytes, or ms). + - Never enqueue “all chunks” into one `Chunks` packet; emit multiple smaller packets across ticks. + +**Related (your PRs):** + +- None directly address server chunk pacing today. +- PR #6 (map compression) reduces disk/map load size but does not solve network pacing. + +**Related (external notes):** + +- `VOXEL_PERFORMANCE_MASTER_PLAN.md`, `SMOOTH_WORLD_STREAMING_REFACTOR_PLAN.md` correctly push “chunk send pacing” as a requirement, but reference constants/systems that do not exist on `master`. + +--- + +### 2) Server chunk serialization allocates huge arrays (P0) + +**Evidence (verified):** + +- `server/src/networking/Serializer.ts` + - `serializeChunk()` does: + - `b: Array.from(chunk.blocks)` + - `r: Array.from(chunk.blockRotations).flatMap(...)` + +**Impact:** + +- For each chunk sent, allocates a 4096-element `number[]` (and then msgpack serializes it). +- For join sync, this multiplies by total chunk count and happens per joining player. + +**Fix direction (high leverage):** + +- Align protocol schema + serialization to allow `Uint8Array` “bin” payloads: + - Update protocol schema validation to accept `Uint8Array` for `ChunkSchema.b` (and ideally send it). + - Update client deserializer to **avoid copying** when `b` is already `Uint8Array`. + - Goal: `ChunkSchema.b` transmitted as msgpack “bin” (compact, fast) instead of an array of numbers. + +--- + +### 3) Server gzip is synchronous in the hot path (P0/P1) + +**Evidence (verified):** + +- `server/src/networking/Connection.ts` + - `Connection.serializePackets()` uses `gzipSync` for payloads > 64KB. + +**Impact:** + +- Compression runs on the server main thread, causing tick spikes during large chunk flushes. + +**Fix direction:** + +- Reduce the need for gzip by shrinking payloads first (typed arrays for chunks, pacing). +- If gzip remains necessary: + - Consider async compression (worker thread) or a different framing strategy. + +--- + +### 4) Server validates packets with AJV before every send (P1) + +**Evidence (verified):** + +- `server/src/networking/Connection.ts` + - `serializePackets()` calls `protocol.isValidPacket(packet)` for every packet, every send. + +**Impact:** + +- AJV validation of large payload packets (especially chunks) is CPU-expensive. + +**Fix direction (safer than “turn it off”):** + +- Cache validation results per packet object identity for the duration of a sync tick (similar to the serialization cache). +- Consider skipping deep validation for the heaviest, most-constructed packets in production builds, but only with strong safeguards (tests, feature flag). + +--- + +### 5) Client chunk meshing lacks greedy quad merging (P0) + +**Evidence (verified):** + +- `client/src/workers/ChunkWorker.ts` + - Per block → per face → if visible → emit quad. + - Face culling exists (neighbor check); no 2D “merge rectangles” pass. +- External note cross-check: `ai-memory/docs/perf-external-notes-2026-03-05/FINDINGS.md` + +**Impact:** + +- Greatly increases: + - Worker compute time + - Geometry transfer sizes + - Main thread BufferGeometry creation cost + - GPU vertex processing and memory pressure + +**Fix direction:** + +- Implement greedy meshing for opaque solids first: + - See external guide: `ai-memory/docs/perf-external-notes-2026-03-05/raw/GREEDY_MESHING_IMPLEMENTATION_GUIDE.md` + - Keep transparent/liquid/trimesh paths per-face initially if needed. + +**Notes on third-party docs:** + +- The greedy meshing guidance is generally sound, but some example metrics and some “Implemented (Hytopia)” claims in `VOXEL_RENDERING_RESEARCH.md` do **not** match this repo’s `master`. + +--- + +### 6) Client builds new BufferGeometry per update (P0/P1) + +**Evidence (verified):** + +- `client/src/chunks/ChunkMeshManager.ts` + - `_createOrUpdateMesh()` always creates `new BufferGeometry()`. + - On update, disposes old geometry and swaps in the new one. + +**Impact:** + +- GPU buffer churn + JS allocations during chunk streaming and block edits. + +**Fix direction:** + +- Reuse geometries: + - Keep one `BufferGeometry` per batch mesh and update `BufferAttribute` arrays in place. + - If size changes frequently, pool common sizes or chunk updates into fixed “slabs”. + +--- + +### 7) Client network decode is synchronous on main thread (P0) + +**Evidence (verified):** + +- `client/src/network/NetworkManager.ts` + - `gunzipSync` (fflate) used for gzip payloads before `packr.unpack`. + +**Impact:** + +- Large chunk packets can block the render thread causing frame hitches. + +**Fix direction:** + +- Move decompression + unpacking off the main thread (net worker). +- Reduce/avoid gzip by shrinking chunk payloads (typed arrays, pacing). + +--- + +### 8) Client deserialization does extra copying + allocations (P1) + +**Evidence (verified):** + +- `client/src/network/Deserializer.ts` + - `deserializeChunk`: `blocks: chunk.b ? new Uint8Array(chunk.b) : undefined` → always copies. + - `deserializeVector` / `deserializeQuaternion`: allocate new objects per update. + +**Impact:** + +- Additional CPU/GC pressure on the main thread during frequent updates. + +**Fix direction:** + +- Avoid copying `Uint8Array` when already typed. +- For hot-path entity updates, consider: + - Updating existing entity objects in place with primitives/typed arrays + - A new bulk packet format (structure-of-arrays) for pos/rot + +--- + +### 9) Client view-distance culling does per-frame full scans (P2) + +**Evidence (verified):** + +- `client/src/chunks/ChunkManager.ts` + - Each `RendererEventType.Animate` iterates all batch IDs and computes distance. + - Comment notes it may be costly and suggests caching/partitioning. + +**Impact:** + +- Becomes noticeable as batch count grows (CPU time per frame). + +**Fix direction:** + +- Cache visibility and recompute only when: + - camera moves across coarse “cells” + - settings change (view distance) + - batch set changes +- Your PR #9 (upstream mirror) targets this area. + +--- + +### 10) FPS cap + DPR cap missing on `master` (P2 quick wins) + +**Evidence (verified):** + +- `client/src/settings/SettingsManager.ts` + - Only `POWER_SAVING` has `fpsCap: 30`. + - `MEDIUM` / `LOW` have none. +- `client/src/core/Renderer.ts` + - pixel ratio uses `window.devicePixelRatio * resolution.multiplier` with no cap. + +**Fix direction (already in your PRs):** + +- PR #4: adds `fpsCap: 60` for `MEDIUM`/`LOW`. +- PR #5: caps mobile `devicePixelRatio` before applying multiplier. + +--- + +## Entity Sync: What’s Right + What’s Missing (Verified) + +### What’s already good + +- `server/src/networking/NetworkSynchronizer.ts`: + - Pos/rot-only entity updates are identified and sent on the **unreliable** channel. + - This reduces HOL blocking under packet loss. + +### What’s missing (main opportunities) + +- No quantized or delta formats exist in the protocol today (`protocol/schemas/Entity.ts` only has `p` and `r`). +- No distance-based sync LOD. + +### External note accuracy + +- `ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md` contains good direction (quantize/distance-LOD), but its **int16 range math is wrong** when using a 1/256 quantization factor. + - If you store `round(x * 256)` in int16, the representable range is roughly **±128 blocks**, not ±32768. + - If you want large-world range and fine precision, use chunk-relative encoding and/or wider ints (int32) and/or a different quant. + +--- + +## Colliders / Physics: What’s Real vs. What’s Assumed + +### Verified current collider model + +- `server/src/worlds/blocks/ChunkLattice.ts` + - Maintains colliders per **block type** (voxel or trimesh). + - Voxel updates use `collider.setVoxel(...)` + `propagateVoxelChange(...)`. + - Trimesh block types trigger full collider rebuild on changes (`_recreateTrimeshCollider`). +- `server/src/worlds/physics/Collider.ts` + - `combineVoxelStates` / `propagateVoxelChange` are Rapier-specific voxel-collider edge/transition requirements, not “merge placements” loops. + +### What the external notes get right + +- Trimesh rebuilds can be expensive as block counts grow. +- “Collider locality” (only simulate nearby blocks) is a valid scaling approach for very large worlds, but it is **not implemented** in this repo today. + +### What the external notes get wrong (relative to this repo) + +- Multiple notes reference systems/constants that do not exist on `master`: + - `CHUNKS_PER_TICK`, `MAX_CHUNKS_PER_SYNC`, `TerrainWorkerPool`, `PersistenceChunkProvider`, `RegionFileFormat.ts`, `COLLIDER_MAX_CHUNK_DISTANCE`, `processPendingColliderChunks`, etc. + +--- + +## Your Performance PRs (Fork) — What They Address + +All PRs below are on `web3dev1337/hytopia-source` as of **2026-03-05**. + +- **#4** “cap FPS on MEDIUM/LOW presets” (OPEN) — adds `fpsCap: 60` to `MEDIUM`/`LOW`. +- **#5** “cap mobile devicePixelRatio” (OPEN) — clamps mobile DPR before applying resolution multiplier. +- **#6** “compressed world maps” (OPEN) — reduces map disk size + load time for JSON-map games; not a direct fix for chunk networking. +- **#7–#9** upstream mirrors (OPEN) — prediction/camera smoothing/client perf pass (chunk visibility caching + outline improvements in #9). +- **#2–#3** analysis docs (OPEN) — large audits and device-specific performance writeups. + +This consolidated report focuses on the *root* hot paths on `master` (#4/#5/#9/#6 are relevant solutions for specific slices). + +--- + +## Third-Party Notes: What’s Correct vs Incorrect (Index) + +All files referenced below are imported under `ai-memory/docs/perf-external-notes-2026-03-05/raw/`. + +### Mostly correct about THIS repo (good signal) + +- `MAP_ENGINE_ARCHITECTURE.md` — accurately describes the JSON map → `World.loadMap` → `ChunkLattice` → `NetworkSynchronizer` flow. +- `GREEDY_MESHING_IMPLEMENTATION_GUIDE.md` — sound generic greedy meshing guidance (implementation work remains). +- `NETWORK_PROTOCOL_2026_RESEARCH.md` — good general direction, but contains quantization math errors (see above). + +### Mixed (some correct observations + some incorrect assumptions) + +- `VOXEL_RENDERING_RESEARCH.md` — correct on face culling present / greedy meshing absent; incorrect “Implemented (Hytopia)” section that does not match `master`. +- `COLLIDER_ARCHITECTURE_RESEARCH.md` — correct high-level collider structure (per block type); incorrect about some “critical path” details and assumes locality/pipelines not present. +- `ENTITY_SYNC_DELTA_COMPRESSION_DESIGN.md` — correct about current entity sync shape; useful ideas; incorrect numeric range claims for int16@1/256. + +### Mostly not about THIS repo (assumes systems not present) + +- `VOXEL_PERFORMANCE_MASTER_PLAN.md` +- `SMOOTH_WORLD_STREAMING_REFACTOR_PLAN.md` +- `VOXEL_ENGINE_2026_MASTER_PLAN.md` +- `MINECRAFT_ARCHITECTURE_RESEARCH.md` (Minecraft info is fine; claims about Hytopia’s procedural systems don’t match this repo) + +--- + +## Recommended Plan (Grounded in Current Code) + +### Phase A — immediate wins (days) + +1) Merge **PR #4** (FPS cap) and **PR #5** (mobile DPR cap). +2) Stop copying chunk blocks twice: + - Update client `Deserializer.deserializeChunk` to avoid `new Uint8Array(...)` when `b` is already `Uint8Array`. + - Update protocol + server serializer to send chunk blocks as `Uint8Array` (bin). +3) Implement chunk pacing on join: + - Replace “queue all chunks” join behavior with a time/byte budget. + +### Phase B — largest structural wins (1–2 weeks) + +4) Implement per-player chunk streaming by view distance (server side). +5) Move client decompress+unpack off main thread (or reduce gzip needs enough that it rarely triggers). + +### Phase C — rendering ceiling (2–4+ weeks) + +6) Implement greedy meshing for opaque solids in `ChunkWorker`. +7) Geometry reuse / pooling in `ChunkMeshManager`. +8) Improve view-distance culling algorithm (or merge upstream PR #11 mirror if acceptable). + +--- + +## Where to Find the “Proof / Verification” Doc + +- Verification of external-note claims against `origin/master` lives in: + - `ai-memory/docs/perf-external-notes-2026-03-05/FINDINGS.md` +