-
Notifications
You must be signed in to change notification settings - Fork 11
Description
GitHub Issue Proposal for RootInteractive (JavaScript Browser Part)
Title: Feature: Implement Generalized LUT-Based Lossy Compression for Browser-Side Data (JS/WASM)
Labels: enhancement, javascript, performance, data-compression, browser
Assignees: (
Milestone: (Optional, depends on project planning)
🚀 Problem Statement
As RootInteractive pushes the boundaries of interactive data exploration in the browser, handling increasingly large numeric datasets becomes a critical challenge. Shipping and processing high-precision floating-point arrays can lead to:
- Increased Network Latency: Large data payloads slow down initial load times and subsequent updates.
- Higher Memory Consumption: Storing full-precision data in the browser consumes significant RAM, impacting performance, especially on mobile devices or for complex dashboards.
- Limited Interactivity: Slower data transfer and processing can hinder smooth zooming, panning, and real-time updates of visualizations.
While exact floating-point precision is often unnecessary for interactive visualization and approximate analysis (e.g., track residuals, cluster response), current browser-side solutions lack a generalized, efficient lossy compression mechanism that accounts for intrinsic data resolution and ensures predictable approximation errors.
✨ Proposed Solution: Generalized LUT-Based Lossy Compression
We propose integrating a generalized Lookup Table (LUT)-based lossy compression framework directly into the RootInteractive JavaScript browser environment, leveraging WebAssembly (WASM) where beneficial for performance-critical decoding.
This approach focuses on:
- Significant Data Reduction: Aims to achieve 4–8x compression ratios for numeric columns.
- Predictable Approximation: Guarantees bounded residuals and preserves statistical properties of the data through intelligent inverse mapping and dithering.
- Optimized for Browser Performance: Designed to exploit cache locality for rapid decompression, making it ideal for interactive dashboards.
🔧 Key Components & Browser-Specific Implementation Details
-
Forward LUT Compression (Server-side/Pre-processing):
- This component would primarily live on the server-side (e.g., Python backend generating data for RootInteractive).
- It applies a monotonic transform
f(x) \rightarrow y(e.g.,log,sqrt,sinh) and quantizesyto discrete bins (e.g., 256 or 65536 bins forUint8ArrayorUint16Arraystorage). - Optional preprocessing like
x' = (x - offset) / scalefor data normalization. - The generated LUT (mapping quantized
yback toxviafinv(y)) and relevant metadata (offset,scale, function type) would be stored alongside the compressed data.
-
Inverse LUT Decompression (Browser-side - JavaScript/WASM):
- The core of the browser-side implementation.
- Decompression would involve:
- Loading the compressed
Uint8ArrayorUint16Arraydata. - Loading the corresponding LUT (as a
Float32ArrayorFloat64Array). - Applying the inverse LUT mapping
x̂ = finv(y)using direct array indexing.
- Loading the compressed
-
Performance Advantage: Crucially, if the LUT size is small (e.g.,
$\approx$ 10-20 kB for 256 or 65536 entries ofFloat32), it will almost certainly fit into the L1/L2 CPU cache. This enables significantly faster decoding in JavaScript/WASM than re-calculating analytical functions on the fly for each data point, which is paramount for smooth interactivity. - WASM Potential: Performance-critical decoding loops could be implemented in WebAssembly for near-native speed, especially for larger arrays, leveraging Rust/C++ compiled to WASM.
-
Dithering (Dequantization Smearing) (Browser-side - JavaScript):
- To counteract visual artifacts (aliasing, histogram spikes) introduced by quantization, a small amount of carefully calculated randomness is added during decompression.
- Formula:
$x̂ \in [\text{finv}(y - 0.5), \text{finv}(y + 0.5)]$ , where a random number is drawn within this reconstructed bin range. - This is a pure JavaScript operation, preserving the original data's density and smoothness for visualizations and downstream statistical processing or ML models without increasing transmitted data size.
-
Parameterization & Metadata (JSON/Typed Arrays):
- Metadata (e.g.,
offset,scale,lutReference,transformType) would be stored as JSON objects alongside the compressedTypedArraydata. - This allows for per-column or per-branch configuration within RootInteractive's data structures.
- The
lutReferencecould point to a shared LUT for common transforms, or an inline LUT for unique cases.
- Metadata (e.g.,
🎯 Use Cases for RootInteractive
- Interactive Visualizations:
- Faster loading and smoother interaction for large 1D/2D histograms, scatter plots, and event displays.
- Example: Visualizing track residuals or cluster charges with minimal visual distortion but dramatically reduced data size.
- Machine Learning Feature Visualization: Displaying quantized ML features efficiently, while ensuring the underlying distribution shape is preserved for qualitative inspection.
- Low-Memory Environments: Enabling RootInteractive to handle larger datasets effectively on devices with limited memory (e.g., tablets, older laptops).
- Remote Data Access (WASM/Lazy Decoding): Efficiently fetch and decode only necessary data segments from a remote server, improving responsiveness for "drill-down" analysis.
🗓️ Current Status (RootInteractive Context)
SqrtScalingwith LUT + inverse + smearing: ✅ A working prototype exists (likely in the Python backend and a conceptual JS equivalent).- Generic LUT codec abstraction: ⌛ In progress (designing the generalized API).
- Metadata schema for
scale/offset: ⌛ In progress (defining how these parameters will be passed to and used by the browser). - Multi-language decoder support (C++/Python/JS): ⏳ Planned (this proposal focuses on the JS part).
- Lazy decoding (WASM/ONNX/in browser): ⏳ Planned (this feature directly supports and benefits from this compression).
🛣️ Next Steps for RootInteractive JavaScript
- Define Browser-Side API: Design the JavaScript API for a
MonotonicLUTQuantizerorLUTLossyCodecthat can:- Accept compressed
TypedArrayand metadata. - Perform efficient inverse LUT lookup and dithering.
- Provide configuration options for different transform types.
- Accept compressed
- Implement Core Decompression Logic: Develop the initial JavaScript implementation for the inverse LUT lookup and dithering.
- Explore WASM Integration if needed Prototype a WASM module for the most performance-critical parts of the decompression (e.g., large array processing loops) and benchmark against pure JavaScript.
- Integrate with RootInteractive Data Flow: Hook the new decompression logic into RootInteractive's data loading and rendering pipeline
- Benchmarking: Conduct thorough browser-side benchmarks comparing:
- Raw
Float32Arraytransfer/processing. - Compressed
Uint8Array/Uint16Array+ JS decompression. - Compressed
Uint8Array/Uint16Array+ WASM decompression. - Metrics: data transfer size, decode time, rendering frame rate.
- Raw
- Documentation: Document the new compression scheme, its benefits, usage, and API for RootInteractive developers and users.