Skip to content

Feature: Implement Generalized LUT-Based Lossy Compression for Browser-Side Data (JS/WASM) - #367

@miranov25

Description

@miranov25

GitHub Issue Proposal for RootInteractive (JavaScript Browser Part)

Title: Feature: Implement Generalized LUT-Based Lossy Compression for Browser-Side Data (JS/WASM)


Labels: enhancement, javascript, performance, data-compression, browser

Assignees: (

Milestone: (Optional, depends on project planning)


🚀 Problem Statement

As RootInteractive pushes the boundaries of interactive data exploration in the browser, handling increasingly large numeric datasets becomes a critical challenge. Shipping and processing high-precision floating-point arrays can lead to:

  1. Increased Network Latency: Large data payloads slow down initial load times and subsequent updates.
  2. Higher Memory Consumption: Storing full-precision data in the browser consumes significant RAM, impacting performance, especially on mobile devices or for complex dashboards.
  3. Limited Interactivity: Slower data transfer and processing can hinder smooth zooming, panning, and real-time updates of visualizations.

While exact floating-point precision is often unnecessary for interactive visualization and approximate analysis (e.g., track residuals, cluster response), current browser-side solutions lack a generalized, efficient lossy compression mechanism that accounts for intrinsic data resolution and ensures predictable approximation errors.

✨ Proposed Solution: Generalized LUT-Based Lossy Compression

We propose integrating a generalized Lookup Table (LUT)-based lossy compression framework directly into the RootInteractive JavaScript browser environment, leveraging WebAssembly (WASM) where beneficial for performance-critical decoding.

This approach focuses on:

  • Significant Data Reduction: Aims to achieve 4–8x compression ratios for numeric columns.
  • Predictable Approximation: Guarantees bounded residuals and preserves statistical properties of the data through intelligent inverse mapping and dithering.
  • Optimized for Browser Performance: Designed to exploit cache locality for rapid decompression, making it ideal for interactive dashboards.

🔧 Key Components & Browser-Specific Implementation Details

  1. Forward LUT Compression (Server-side/Pre-processing):

    • This component would primarily live on the server-side (e.g., Python backend generating data for RootInteractive).
    • It applies a monotonic transform f(x) \rightarrow y (e.g., log, sqrt, sinh) and quantizes y to discrete bins (e.g., 256 or 65536 bins for Uint8Array or Uint16Array storage).
    • Optional preprocessing like x' = (x - offset) / scale for data normalization.
    • The generated LUT (mapping quantized y back to x via finv(y)) and relevant metadata (offset, scale, function type) would be stored alongside the compressed data.
  2. Inverse LUT Decompression (Browser-side - JavaScript/WASM):

    • The core of the browser-side implementation.
    • Decompression would involve:
      • Loading the compressed Uint8Array or Uint16Array data.
      • Loading the corresponding LUT (as a Float32Array or Float64Array).
      • Applying the inverse LUT mapping x̂ = finv(y) using direct array indexing.
    • Performance Advantage: Crucially, if the LUT size is small (e.g., $\approx$ 10-20 kB for 256 or 65536 entries of Float32), it will almost certainly fit into the L1/L2 CPU cache. This enables significantly faster decoding in JavaScript/WASM than re-calculating analytical functions on the fly for each data point, which is paramount for smooth interactivity.
    • WASM Potential: Performance-critical decoding loops could be implemented in WebAssembly for near-native speed, especially for larger arrays, leveraging Rust/C++ compiled to WASM.
  3. Dithering (Dequantization Smearing) (Browser-side - JavaScript):

    • To counteract visual artifacts (aliasing, histogram spikes) introduced by quantization, a small amount of carefully calculated randomness is added during decompression.
    • Formula: $x̂ \in [\text{finv}(y - 0.5), \text{finv}(y + 0.5)]$, where a random number is drawn within this reconstructed bin range.
    • This is a pure JavaScript operation, preserving the original data's density and smoothness for visualizations and downstream statistical processing or ML models without increasing transmitted data size.
  4. Parameterization & Metadata (JSON/Typed Arrays):

    • Metadata (e.g., offset, scale, lutReference, transformType) would be stored as JSON objects alongside the compressed TypedArray data.
    • This allows for per-column or per-branch configuration within RootInteractive's data structures.
    • The lutReference could point to a shared LUT for common transforms, or an inline LUT for unique cases.

🎯 Use Cases for RootInteractive

  • Interactive Visualizations:
    • Faster loading and smoother interaction for large 1D/2D histograms, scatter plots, and event displays.
    • Example: Visualizing track residuals or cluster charges with minimal visual distortion but dramatically reduced data size.
  • Machine Learning Feature Visualization: Displaying quantized ML features efficiently, while ensuring the underlying distribution shape is preserved for qualitative inspection.
  • Low-Memory Environments: Enabling RootInteractive to handle larger datasets effectively on devices with limited memory (e.g., tablets, older laptops).
  • Remote Data Access (WASM/Lazy Decoding): Efficiently fetch and decode only necessary data segments from a remote server, improving responsiveness for "drill-down" analysis.

🗓️ Current Status (RootInteractive Context)

  • SqrtScaling with LUT + inverse + smearing: ✅ A working prototype exists (likely in the Python backend and a conceptual JS equivalent).
  • Generic LUT codec abstraction: ⌛ In progress (designing the generalized API).
  • Metadata schema for scale/offset: ⌛ In progress (defining how these parameters will be passed to and used by the browser).
  • Multi-language decoder support (C++/Python/JS): ⏳ Planned (this proposal focuses on the JS part).
  • Lazy decoding (WASM/ONNX/in browser): ⏳ Planned (this feature directly supports and benefits from this compression).

🛣️ Next Steps for RootInteractive JavaScript

  1. Define Browser-Side API: Design the JavaScript API for a MonotonicLUTQuantizer or LUTLossyCodec that can:
    • Accept compressed TypedArray and metadata.
    • Perform efficient inverse LUT lookup and dithering.
    • Provide configuration options for different transform types.
  2. Implement Core Decompression Logic: Develop the initial JavaScript implementation for the inverse LUT lookup and dithering.
  3. Explore WASM Integration if needed Prototype a WASM module for the most performance-critical parts of the decompression (e.g., large array processing loops) and benchmark against pure JavaScript.
  4. Integrate with RootInteractive Data Flow: Hook the new decompression logic into RootInteractive's data loading and rendering pipeline
  5. Benchmarking: Conduct thorough browser-side benchmarks comparing:
    • Raw Float32Array transfer/processing.
    • Compressed Uint8Array/Uint16Array + JS decompression.
    • Compressed Uint8Array/Uint16Array + WASM decompression.
    • Metrics: data transfer size, decode time, rendering frame rate.
  6. Documentation: Document the new compression scheme, its benefits, usage, and API for RootInteractive developers and users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions