93M Gaussians/sec read | 57M Gaussians/sec write | Auto-optimized
Quick Start • Installation • Examples • API Reference • Performance
from gsply import plyread, GSData, GSTensor
# Read PLY file (auto-detects format, zero-copy)
data = plyread("model.ply") # Functional API
# Or use object-oriented API
data = GSData.load("model.ply") # Classmethod
# Access fields
positions = data.means # (N, 3) xyz coordinates
colors = data.sh0 # (N, 3) RGB colors
scales = data.scales # (N, 3) scale parameters
rotations = data.quats # (N, 4) quaternions
# Save PLY file
data.save("output.ply") # Uncompressed
data.save("output.ply", compressed=True) # Compressed (71-74% smaller)
# GPU acceleration (optional)
gstensor = GSTensor.load("model.ply", device='cuda')
gstensor.save("output.compressed.ply") # GPU compressionPerformance: 93M Gaussians/sec read, 57M Gaussians/sec write (400K Gaussians in 6-7ms)
Ultra-fast Gaussian Splatting PLY I/O for Python. Zero-copy reads, auto-optimized writes, optional GPU acceleration.
- Ultra-Fast: 93M Gaussians/sec read, 57M Gaussians/sec write
- Zero-Copy: Reads use memory views for maximum performance
- Auto-Optimized: Writes are 2.6-2.8x faster automatically
- Format Support: Uncompressed PLY + PlayCanvas compressed (71-74% smaller) + SOG format
- GPU Ready: Optional PyTorch integration with GSTensor (11x faster transfers)
- Pure Python: NumPy + Numba (no C++ compilation required)
- Object-Oriented API:
data.save(),GSData.load(),gstensor.save(),GSTensor.load() - Format Conversion:
normalize(),denormalize()with fused kernels (~8-15x faster) - Color Conversion:
to_rgb(),to_sh()for SH ↔ RGB conversion - Comprehensive: 406 passing tests, full type hints, extensive documentation
pip install gsplyCore dependencies: NumPy and Numba (automatically installed)
GPU Acceleration (PyTorch):
pip install torchEnables GSTensor, plyread_gpu(), plywrite_gpu(), and GPU-accelerated format conversions.
SOG Format Support:
pip install gsply[sogs]Enables sogread() for reading SOG (Splat Ordering Grid) format files.
Full Installation:
pip install gsply[sogs] torch # GPU + SOG supportDevelopment:
git clone https://github.com/OpsiClear/gsply.git
cd gsply
pip install -e .[dev] # Includes pytest, ruff, mypyfrom gsply import plyread, plywrite, GSData
# Read PLY file
data = plyread("model.ply")
print(f"Loaded {len(data)} Gaussians")
# Object-oriented API
data = GSData.load("model.ply") # Auto-detects format
data.save("output.ply") # Uncompressed
data.save("output.ply", compressed=True) # Compressed
# Unpack to individual arrays
means, scales, quats, opacities, sh0, shN = data.unpack()
# Write with individual arrays
plywrite("output.ply", means, scales, quats, opacities, sh0, shN)from gsply import GSData
import numpy as np
# Load PLY file (log-scales, logit-opacities)
data = GSData.load("scene.ply")
# Convert to linear format for easier manipulation
data.denormalize() # Uses fused kernel (~8-15x faster)
# Modify in linear space
data.opacities = np.clip(data.opacities * 1.2, 0, 1)
data.scales *= 1.1
# Convert back to PLY format before saving
data.normalize() # Uses fused kernel (~8-15x faster)
data.save("modified.ply")from gsply import GSTensor, plyread_gpu, plywrite_gpu
# Direct GPU I/O (4-5x faster than CPU decompress + GPU transfer)
gstensor = plyread_gpu("model.compressed.ply", device='cuda')
# Or convert from CPU
data = GSData.load("model.ply")
gstensor = GSTensor.from_gsdata(data, device='cuda')
# Access GPU tensors
positions_gpu = gstensor.means # torch.Tensor on GPU
colors_gpu = gstensor.sh0 # torch.Tensor on GPU
# Filter on GPU
high_opacity = gstensor[gstensor.opacities > 0.5]
# GPU format conversion
gstensor.denormalize() # GPU-accelerated
# Write back (GPU compression)
gstensor.save("output.compressed.ply") # GPU compression by defaultfrom gsply import GSData, GSTensor
import numpy as np
import torch
# Create GSData from arrays with format preset
data = GSData.from_arrays(
means=np.random.randn(1000, 3),
scales=np.random.rand(1000, 3),
quats=np.random.randn(1000, 4),
opacities=np.random.rand(1000),
sh0=np.random.randn(1000, 3),
format="linear" # or "auto", "ply", "rasterizer"
)
# Create GSTensor from tensors
gstensor = GSTensor.from_arrays(
means=torch.randn(1000, 3),
scales=torch.rand(1000, 3),
quats=torch.randn(1000, 4),
opacities=torch.rand(1000),
sh0=torch.randn(1000, 3),
format="linear",
device="cuda"
)from gsply import GSData
# Slicing and indexing
subset = data[100:200] # Slice
first = data[0] # Single Gaussian
filtered = data[data.opacities > 0.5] # Boolean mask
# Concatenation
combined = data1 + data2 # Pairwise (1.9x faster)
merged = GSData.concatenate([data1, data2, data3]) # Bulk (6.15x faster)
# Optimize for faster operations
data = data.make_contiguous() # 2-45x speedup for operations
# Copy and modify
bright = data.copy()
bright.sh0 *= 1.5 # Make brighterfrom gsply import compress_to_bytes, decompress_from_bytes
# Compress for network transfer or storage
compressed_bytes = compress_to_bytes(data)
# Decompress from bytes
data_restored = decompress_from_bytes(compressed_bytes)from gsply import sogread
# Read SOG file - returns GSData (same API as plyread)
data = sogread("model.sog")
positions = data.means
colors = data.sh0
# Read from bytes (in-memory, no disk I/O)
with open("model.sog", "rb") as f:
sog_bytes = f.read()
data = sogread(sog_bytes) # Fully in-memory extractionUncompressed Format (400K Gaussians, SH0):
- Read: 5.7ms (70M Gaussians/sec)
- Write: 19.3ms (21M Gaussians/sec)
Compressed Format (400K Gaussians, SH0):
- Read: 8.5ms (47M Gaussians/sec)
- Write: 15.0ms (27M Gaussians/sec)
- Size reduction: 71-74%
Peak Performance:
- Read: 78M Gaussians/sec (1M Gaussians, SH0, uncompressed)
- Write: 29M Gaussians/sec (100K Gaussians, SH0, compressed)
GPU Transfer (400K Gaussians, RTX 3090 Ti):
- With
_baseoptimization: 1.99ms (11x faster, single tensor transfer) - Without
_base: 22.78ms (CPU copy + transfer)
Format Conversion (Fused Kernels):
normalize()/denormalize(): ~8-15x faster with parallel Numba kernels- Single-pass processing reduces memory overhead
See detailed performance benchmarks for more information.
Standard binary little-endian PLY format:
| SH Degree | Properties | Description |
|---|---|---|
| 0 | 14 | xyz, f_dc(3), opacity, scales(3), quats(4) |
| 1 | 23 | + 9 f_rest coefficients |
| 2 | 38 | + 24 f_rest coefficients |
| 3 | 59 | + 45 f_rest coefficients |
Chunk-based quantized format:
- Automatically saves as
.compressed.plywhencompressed=True - Compression ratio: 71-74% size reduction
- Compatible with PlayCanvas, SuperSplat, other WebGL viewers
- Parallel compression/decompression with Numba JIT
WebP-based texture format for web deployment:
- Requires
gsply[sogs]installation - Uses WebP images for efficient storage
- Codebook-based compression for scales and colors
- Compatible with PlayCanvas splat-transform
- Supports both
.sogZIP bundles and folder formats - Returns
GSDatacontainer (same API asplyread()) - In-memory ZIP extraction: Can read directly from bytes
Complete API documentation: docs/API_REFERENCE.md
plyread(file_path)- Read PLY files (auto-detects format)plywrite(file_path, ...)- Write PLY filesdetect_format(file_path)- Detect format and SH degreesogread(file_path | bytes)- Read SOG files (optional)
GSData.load(file_path)- Load from PLY (classmethod)data.save(file_path, compressed=False)- Save to PLYGSData.from_arrays(...)- Create from arrays with format presetGSData.from_dict(...)- Create from dictionary with format presetdata.normalize()/data.denormalize()- Format conversion (fused kernels, ~8-15x faster)data.to_rgb()/data.to_sh()- Color conversion- Format query:
is_scales_ply,is_scales_linear,is_opacities_ply,is_opacities_linear,is_sh0_sh,is_sh0_rgb,is_sh_order_0/1/2/3 data[index]- Indexing and slicingdata.unpack()- Unpack to tupledata.copy()- Deep copy
compress_to_bytes(data)- Compress to bytescompress_to_arrays(data)- Compress to arraysdecompress_from_bytes(bytes)- Decompress from bytes
sh2rgb(sh)/rgb2sh(rgb)- Color conversionlogit(x)/sigmoid(x)- Optimized CPU functionsapply_pre_activations(data, ...)- Fused activation kernel (~8-15x faster)apply_pre_deactivations(data, ...)- Fused deactivation kernel (~8-15x faster)SH_C0- Normalization constant
GSTensor.load(file_path, device='cuda')- Load from PLY (classmethod)gstensor.save(file_path, compressed=True)- Save with GPU compressionGSTensor.from_arrays(...)- Create from tensors with format presetGSTensor.from_gsdata(data, device='cuda')- Convert to GPUgstensor.to_gsdata()- Convert to CPUgstensor.normalize()/gstensor.denormalize()- GPU format conversiongstensor.to_rgb()/gstensor.to_sh()- GPU color conversion- Format query:
is_scales_ply,is_scales_linear,is_opacities_ply,is_opacities_linear,is_sh0_sh,is_sh0_rgb,is_sh_order_0/1/2/3 - Device management:
.to(),.cpu(),.cuda() - Precision:
.half(),.float(),.double()
- torch.compile() Auto-Optimization: GPU compression now uses
torch.compile()when available- ~25% faster GPU compression (5.0ms → 4.0ms for 365K Gaussians)
- Automatic fallback to eager mode if compilation fails
- Zero configuration required - works transparently
- GPU Rounding Fix: Fixed quaternion quantization to match CPU behavior exactly
- Platform Support: Triton backend on Linux (standard), Windows (requires
triton-windowspackage)
- Protocol Interfaces: Type-safe interfaces for format management
FormatAware- Protocol for objects that track format stateNormalizable- Protocol for objects that support format conversionGaussianContainer- Protocol for objects containing Gaussian splat data- Enables structural typing across GSData and GSTensor
- Format Management API: Advanced format control methods
format_stateproperty - Returns current format as immutable dictionarycopy_format_from(other)- Copy format state from another objectwith_format(**kwargs)- Create shallow copy with modified format
- Performance Optimizations: Improved efficiency for common operations
- Removed auto-consolidate overhead in plywrite() - users can call
make_contiguous()manually when needed - In-place format conversion now default (
inplace=True) - reduces memory allocations - Better performance for already-contiguous data
- Removed auto-consolidate overhead in plywrite() - users can call
- Format Query Properties: Convenient boolean properties to check current data format
is_scales_ply,is_scales_linear- Check scale formatis_opacities_ply,is_opacities_linear- Check opacity formatis_sh0_sh,is_sh0_rgb- Check color formatis_sh_order_0/1/2/3- Check SH degree- Available on both
GSDataandGSTensor
- Fused Activation Kernels: Ultra-fast format conversion with parallel Numba kernels
apply_pre_activations()- Fused kernel for activating scales, opacities, and quaternions (~8-15x faster)apply_pre_deactivations()- Fused kernel for deactivating scales and opacities (~8-15x faster)normalize()anddenormalize()now use optimized fused kernels internally- Single-pass processing reduces memory overhead and improves cache locality
- Convenience Factory Methods:
GSData.from_arrays(),GSData.from_dict(),GSTensor.from_arrays(),GSTensor.from_dict() - Auto-Format Detection: Smart heuristics automatically detect PLY vs Linear format
- Format Safety: Strict validation prevents mixing incompatible formats
- Format Helpers:
create_ply_format(),create_rasterizer_format()
- SOG Format Support:
sogread()- Read SOG (Splat Ordering Grid) format files - Object-Oriented I/O API:
data.save(),GSData.load(),gstensor.save(),GSTensor.load() - Format Conversion API:
normalize(),denormalize()for linear ↔ PLY format conversion - Color Conversion API:
to_rgb(),to_sh()for SH ↔ RGB conversion
Full Changelog • API Reference
# Clone repository
git clone https://github.com/OpsiClear/gsply.git
cd gsply
# Install in development mode
pip install -e .[dev]
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=gsply --cov-report=htmlgsply/
├── src/gsply/ # Source code
│ ├── gsdata.py # GSData dataclass
│ ├── reader.py # PLY reading
│ ├── writer.py # PLY writing
│ ├── formats.py # Format detection
│ ├── utils.py # Utility functions (fused kernels)
│ └── torch/ # PyTorch integration
│ └── gstensor.py # GSTensor GPU dataclass
├── tests/ # Unit tests (365 tests)
├── benchmarks/ # Performance benchmarks
├── docs/ # Documentation
└── pyproject.toml # Package configuration
gsply has comprehensive test coverage with 406 passing tests:
# Run all tests
pytest tests/ -v
# Run PyTorch tests (requires torch)
pytest tests/ -v -k "torch or gstensor"
# Run with coverage
pytest tests/ -v --cov=gsply --cov-report=html# Install benchmark dependencies
pip install -e .[benchmark]
# Run benchmark
python benchmarks/benchmark.py- API Reference - Complete API documentation
- Changelog - Version history and release notes
- Contributing - Contribution guidelines
Contributions are welcome! Please see docs/CONTRIBUTING.md for guidelines.
Quick start:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run tests:
pytest tests/ -v - Submit a pull request
MIT License - see LICENSE file for details.
If you use gsply in your research, please cite:
@software{gsply2024,
author = {OpsiClear},
title = {gsply: Ultra-Fast Gaussian Splatting PLY I/O},
year = {2024},
url = {https://github.com/OpsiClear/gsply}
}- gsplat: CUDA-accelerated Gaussian Splatting rasterizer
- nerfstudio: NeRF training framework with Gaussian Splatting support
- PlayCanvas SuperSplat: Web-based Gaussian Splatting viewer
- 3D Gaussian Splatting: Original paper and implementation
Made with Python and numpy