Blackhole EH is a research-oriented project exploring lossless image storage and deduplication based on block-level processing and reversible transforms.
The system converts images into fixed-size blocks, separates luminance and chrominance components,
and stores images locally in a custom binary format (.blho) while uploading unique blocks
to a server for deduplicated storage.
⚠️ Project status This project is in an active R&D phase. It is not yet optimized for compression efficiency and should be considered a research prototype rather than a production-ready archiver.
Traditional image formats (JPEG, PNG, etc.) are optimized for single-file compression, but they do not address:
- Cross-image redundancy
- Block-level deduplication across datasets
- Research into alternative lossless representations beyond entropy coding
Blackhole EH investigates whether content-addressable, block-based storage can be used as a foundation for future lossless image archival systems.
Images are split into 8×8 pixel blocks. Each pixel is converted from RGB into Y/U/V components using a reversible integer transform (RCT).
This guarantees:
- Exact reversibility
- No rounding
- No loss of information
For each 8×8 block:
- Y (luma) is stored as 8-bit values
- U and V (chroma) are stored as signed 16-bit values (packed as bytes)
Each component is processed independently.
.blho files store the structural description of an image, not its pixel data.
- File header (
BLHO, version 2) - JSON metadata
- Lists of unique SHA-256 hashes for:
- Y blocks
- U blocks
- V blocks
- Position maps referencing these hashes
- Raw block data
- Pixel values
- Compressed image bytes
This design allows .blho files to act as manifests that can reconstruct an image once the corresponding blocks are available.
- Image loading
- Padding to multiples of 8×8 (edge pixels replicated)
- Block splitting
- RGB → RCT transform
- SHA-256 hashing of Y / U / V blocks
- Deduplication within the image
.blhofile generation- Server check for missing blocks
- Upload only missing blocks
The server stores blocks indexed by:
- SHA-256 hash
- Block type (Y/U/V)
For each processed image:
- The client checks which hashes already exist
- Only missing blocks are uploaded
- Duplicate blocks across images are stored once
All operations in Blackhole EH are bit-exact:
- Reversible integer color transform
- No quantization
- No floating-point arithmetic
- No entropy coding (yet)
Reconstruction is deterministic and fully lossless as long as:
- All referenced blocks are available
- Position maps are preserved
.blhofiles are often larger than the original JPEG- No entropy reduction beyond deduplication
- Position maps are not optimized
- No spatial prediction or delta coding (future research)
These limitations are intentional at this stage and are the subject of ongoing research.
The following topics are under investigation but not implemented yet:
- Spatial correlation analysis
- Delta (residual) representations
- Entropy estimation
- Hybrid block encoding (absolute vs delta)
- BLHO v3 format design
See project issues for detailed research tasks.
Key components:
BlockSplitterSplits images into padded 8×8 RCT blocksBlhoWriterGenerates.blhofiles (v2 format)FileProcessorOrchestrates image processing and server interactionBlockClientCommunicates with the block storage server
- Java 21+ (tested with Java 25)
- Maven
- Spring Boot 3.x
Configure the image directory in application.yml, then run:
mvn spring-boot:runThe application will:
- Process all JPG/JPEG images in the configured directory
- Generate
.blhofiles - Upload missing blocks to the server
This project is intended for:
- Researchers in lossless compression
- Engineers exploring content-addressable storage
- Developers interested in alternative image representations
It is not intended as a drop-in replacement for existing image codecs.
This project is proprietary and provided for research and review purposes only.
© 2025 Anatoliy Levitsky. All rights reserved.
Any use, reproduction, modification, or distribution of this software without explicit written permission from the author is prohibited.