Hawkynt.Compression.Core

Pure-managed compression primitives extracted from CompressionWorkbench. Every algorithm is implemented from scratch — no native dependency on zlib, liblzma, libarchive, or any other third-party compression library. The package is single-purpose: it bundles only Compression.Registry (the building-block + format-descriptor interfaces) directly into lib/, so consumers add a single dependency and get the full primitive surface. PCM audio, archive formats, and filesystem readers ship as separate single-responsibility packages — alongside the sibling Hawkynt.FileFormats.Images — and are not pulled in transitively by Hawkynt.Compression.Core:

Hawkynt.FileFormats.Audio — PCM / FLAC / WAV / AIFF / AU / Vorbis / Opus / AAC / MP3
Hawkynt.FileFormats.Archives — ZIP / TAR / 7Z / RAR / and the long tail
Hawkynt.FileFormats.FileSystems — FAT / ext4 / NTFS / HFS+ / APFS / Btrfs / etc.

When to use this package

Reach for Compression.Core when any of these is true:

You need a compression algorithm that's not in the BCL (LZW / LZMA / LZHAM / Brotli / Zopfli / FSE / rANS / PAQ8 / context-mixing / Burrows-Wheeler / arithmetic / range / Tunstall / DMC / etc.)
You need a fully managed implementation (e.g. for AOT, deterministic-build, or environments where binding against libzlib is forbidden by security review)
You need fine-grained control: bit-level I/O, raw entropy stages, transform composition, custom match finders — building blocks compose freely
You're benchmarking algorithms against each other and want a level playing field — every codec here uses the same byte[] Compress(ReadOnlySpan<byte>) / byte[] Decompress(ReadOnlySpan<byte>) shape

Skip it when:

You only need DEFLATE / GZIP / Brotli / ZLib at default settings — System.IO.Compression is faster on hot paths because it ships with native code paths
You need format-level handling (ZIP / TAR / 7Z / RAR archives) — wait for Hawkynt.CompressionWorkbench.Lib (sister package, not yet on NuGet) or use the source repo

Quick start

Algorithm round-trip

Every building block exposes the same two-method shape via the Compression.Registry.IBuildingBlock interface:

using Compression.Core.Dictionary.Lzw;
using Compression.Registry;   // IBuildingBlock + BuildingBlockRegistry

IBuildingBlock lzw = BuildingBlockRegistry.GetById("BB_Lzw")!;
byte[] compressed = lzw.Compress(originalBytes);
byte[] roundTripped = lzw.Decompress(compressed);

Or call the concrete codec directly when you want non-default parameters:

using var ms = new MemoryStream();
var encoder = new LzwEncoder(ms, minBits: 9, maxBits: 12);
encoder.Encode(originalBytes);

ms.Position = 0;
var decoder = new LzwDecoder(ms, minBits: 9, maxBits: 12);
byte[] roundTripped = decoder.Decode(originalBytes.Length);

Hashing

using Compression.Core.Hashing;

uint crc      = Crc32C.Compute(data);  // hardware-accelerated SSE4.2 / ARM CRC32C
uint xx32     = XxHash32.Compute(data);
ulong fnv     = Fnv1a64.Compute(data);
byte[] sha256 = Sha256.Compute(data);

Bit I/O

using Compression.Core.BitIO;

using var ms = new MemoryStream();
var writer = new BitWriter(ms, BitOrder.MsbFirst);
writer.WriteBits(0b1011_0010, count: 8);
writer.WriteBits(0xF,         count: 4);
writer.Flush();

ms.Position = 0;
var reader = new BitReader(ms, BitOrder.MsbFirst);
int byteVal = reader.ReadBits(8);   // 0xB2
int nibble  = reader.ReadBits(4);   // 0xF

Discovering building blocks

foreach (var bb in BuildingBlockRegistry.All)
  Console.WriteLine($"{bb.Id,-18} {bb.Family,-14} {bb.Description}");

The benchmark tool in the source repo (cwb benchmark <file>) iterates this same list and ranks every algorithm by ratio + compress / decompress wall time on a single input.

Choosing the right algorithm

You want…	Pick
Fastest possible round-trip on logs / JSON / mixed data	`BB_Lz4`, `BB_Snappy`, `BB_Pithy`, `BB_Lzfx`, `BB_Lzo`, `BB_Lzjb`
Best ratio with reasonable CPU budget (general purpose)	`BB_Lzma`, `BB_Brotli`, `BB_Zstd-via-FSE`
Best ratio at any CPU cost (research / archival)	`BB_PAQ8`, `BB_Cmix`, `BB_Mcm`, `BB_Bsc`, `BB_Csc`, `BB_Zpaq`
Streamable byte-level codec (network / pipe)	`BB_Deflate`, `BB_Lz4`, `BB_Snappy`
Lossless image-friendly	`BB_Deflate` (PNG layer), `BB_Lzma` (lossless), `BB_Brotli` (modern web)
Domain transform first, then entropy code	`BB_Bwt` → `BB_Mtf` → `BB_Huffman` (the bzip2 stack)
Adaptive entropy stage	`BB_Arithmetic`, `BB_RangeCoding`, `BB_rANS`, `BB_FSE`
Static entropy stage (precomputed table)	`BB_Huffman`, `BB_Tunstall`, `BB_ShannonFano`
Geometric integer distributions	`BB_Golomb`, `BB_Rice`, `BB_ExpGolomb`, `BB_EliasGamma`, `BB_EliasDelta`
Universal integer codes (no model)	`BB_Unary`, `BB_Fibonacci`, `BB_Levenshtein`
DNA / 4-symbol alphabets	`BB_Dna` (specialised)
Hardware-friendly chunked compression	`BB_842` (IBM hardware-style 2 / 4 / 8-byte template matching)
Network-protocol-style framing (packet sizes)	`BB_Lzs` (Stac LZS — RFC 1967 / RFC 2395)
Templated grammar / repetition mining	`BB_RePair`, `BB_Bpe`, `BB_Lzwl`

When in doubt: run cwb benchmark from the source repo on a representative sample of your data and let the numbers pick.

Building blocks reference

Every entry below registers via IBuildingBlock, exposes both Compress and Decompress, and was verified via the project's round-trip test matrix. Each row links to the algorithm's primary reference or canonical specification.

Id	Algorithm	Family	State	Description	Reference
`BB_Deflate`	DEFLATE	Dictionary	R/W	LZ77 + Huffman, the algorithm inside gzip / zip / png	RFC 1951
`BB_Deflate64`	Deflate64	Dictionary	R/W	Enhanced DEFLATE with 64 KB window	MS-ZIP
`BB_Lz77`	LZ77	Dictionary	R/W	Sliding-window dictionary with distance / length tokens	Ziv & Lempel 1977
`BB_Lz78`	LZ78	Dictionary	R/W	Phrase-dictionary precursor to LZW	Ziv & Lempel 1978
`BB_Lzw`	LZW	Dictionary	R/W	Dictionary coding used in GIF / Unix `compress`	Welch 1984
`BB_Lzo`	LZO1X	Dictionary	R/W	Extremely fast, decompression-optimised	oberhumer.com
`BB_Lzss`	LZSS	Dictionary	R/W	LZ77 variant with flag-bit encoding	Storer & Szymanski 1982
`BB_Lz4`	LZ4	Dictionary	R/W	Extremely fast LZ77-family block compression	LZ4 block format
`BB_Snappy`	Snappy	Dictionary	R/W	Fast LZ77-family compression (Google)	Snappy format
`BB_Brotli`	Brotli	Dictionary	R/W	LZ77 + Huffman with static dictionary (Google)	RFC 7932
`BB_Lzma`	LZMA	Dictionary	R/W	LZ + Markov chain + range coding	7-Zip LZMA SDK
`BB_Lzx`	LZX	Dictionary	R/W	LZ77 + Huffman used in CAB / CHM / WIM	MS-PATCH LZX
`BB_Xpress`	XPRESS Huffman	Dictionary	R/W	Windows XPRESS (NTFS / WIM / Hyper-V)	MS-XCA
`BB_Lzh`	LZH (LH5)	Dictionary	R/W ¹	Lempel-Ziv with adaptive Huffman, used in LHA	LZH format doc
`BB_Arj`	ARJ	Dictionary	R/W ¹	Modified LZ77 + Huffman from ARJ archives	ARJ technote
`BB_Lzms`	LZMS	Dictionary	R/W	LZ + Markov + Shannon delta (Windows WIM / ESD)	MS-XCA LZMS
`BB_Lzp`	LZP	Dictionary	R/W	Lempel-Ziv Prediction with context-based match prediction	Bloom 1996
`BB_Ace`	ACE	Dictionary	R/W	LZ77 + Huffman from ACE archive format	unace-nonfree
`BB_Rar`	RAR5	Dictionary	R/W	LZ + Huffman + PPM from RAR5	rarlab technote
`BB_Sqx`	SQX	Dictionary	R/W ¹	LZ + Huffman from the SQX archive format	SqxFormat notes
`BB_ROLZ`	ROLZ	Dictionary	R/W	Reduced-Offset LZ with context-based match tables	encode.su
`BB_LZHAM`	LZHAM	Dictionary	R/W	LZ77 + Huffman, inspired by LZHAM codec	LZHAM repo
`BB_Lzs`	LZS	Dictionary	R/W	Stac LZS (7 / 11-bit offset LZSS for networking)	RFC 1967 / RFC 2395
`BB_Lzwl`	LZWL	Dictionary	R/W	LZW with variable-length initial alphabet from digram analysis	LZWL paper
`BB_RePair`	Re-Pair	Dictionary	R/W	Recursive Pairing, offline grammar-based compression	Larsson & Moffat 1999
`BB_842`	842	Dictionary	R/W	IBM 842 hardware compression with 2 / 4 / 8-byte template matching	Linux `crypto/842*`
`BB_PPM`	PPM	Context-Mixing	R/W	Prediction by Partial Matching, order-2 context modelling	Cleary & Witten 1984
`BB_CTW`	CTW	Context-Mixing	R/W	Context Tree Weighting — optimal universal compression	Willems 1995
`BB_Huffman`	Huffman	Entropy	R/W	Optimal prefix-free entropy coding	Huffman 1952
`BB_Arithmetic`	Arithmetic	Entropy	R/W	Order-0 arithmetic coding with frequency table	Witten / Neal / Cleary 1987
`BB_ShannonFano`	Shannon-Fano	Entropy	R/W	Recursive frequency-splitting precursor to Huffman	Shannon 1948
`BB_Golomb`	Golomb / Rice	Entropy	R/W	Optimal coding for geometric distributions	Golomb 1966
`BB_Fibonacci`	Fibonacci	Entropy	R/W	Universal code with `11` terminators (Zeckendorf)	Apostolico & Fraenkel 1987
`BB_FSE`	FSE / tANS	Entropy	R/W	Table-based ANS (Zstd's entropy stage)	Duda 2013 / Collet's blog
`BB_BPE`	Byte Pair Encoding	Entropy	R/W	Iterative most-frequent-pair replacement	Gage 1994
`BB_RangeCoding`	Range coding	Entropy	R/W	Byte-oriented arithmetic coding with carryless normalisation	Martin 1979
`BB_rANS`	rANS	Entropy	R/W	Range ANS, used in AV1 and LZFSE	Duda 2013
`BB_ExpGolomb`	Exp-Golomb	Entropy	R/W	Exponential Golomb, used in H.264 / H.265	Teuhola 1978
`BB_Unary`	Unary	Entropy	R/W	Simplest universal code: N ones followed by a zero	—
`BB_EliasGamma`	Elias gamma	Entropy	R/W	Universal code using a unary length prefix	Elias 1975
`BB_EliasDelta`	Elias delta	Entropy	R/W	Gamma-codes the bit length	Elias 1975
`BB_Levenshtein`	Levenshtein	Entropy	R/W	Self-delimiting universal code with recursive length prefixing	Levenshtein 1968
`BB_Tunstall`	Tunstall	Entropy	R/W	Variable-to-fixed code, dual of Huffman	Tunstall 1967 (PhD thesis)
`BB_Dmc`	DMC	Entropy	R/W	Dynamic Markov Compression, bit-level FSM with state cloning	Cormack & Horspool 1987
`BB_Bwt`	BWT	Transform	R/W	Burrows-Wheeler Transform, reorders bytes for better compression	Burrows & Wheeler 1994
`BB_Mtf`	MTF	Transform	R/W	Move-to-Front Transform	Bentley et al. 1986
`BB_Delta`	Delta	Transform	R/W	Delta filter, stores differences between consecutive bytes	—
`BB_Rle`	RLE	Transform	R/W	Run-Length Encoding	—
`BB_Dpcm`	DPCM	Transform	R/W	Differential PCM, sample-to-sample differences	—

¹ Known edge cases on certain pathological inputs — see the source repo's known-fixes log. The mainline round-trip works for all real-world data we've tested.

Module overview

Module	Surface	What's in it
`Compression.Core.BitIO`	`BitReader`, `BitWriter`, `BitBuffer`, `BitOrder`	LSB / MSB-first bit streams over `Stream`. Foundation for every entropy / variable-length coder in the package.
`Compression.Core.Hashing`	`Crc32`, `Crc32C`, `Adler32`, `XxHash32`, `Fletcher16/32/64`, `MurmurHash3`, `Fnv1` / `Fnv1a`, `Sha1`, `Sha256`, `Md5`, `Blake2b`	Non-cryptographic checksums and a few crypto hashes. CRC-32C uses SSE4.2 / ARM CRC32C instructions where available, with a slicing-by-4 software fallback.
`Compression.Core.Crypto`	`AesCbc`, `AesCtr`, `Blowfish`, `Pbkdf2`, `ZipCrypto`	Format-required ciphers (ZipCrypto for legacy ZIP, AES-256 for ZIP-AE).
`Compression.Core.DataStructures`	`SlidingWindow`, `PriorityQueue`, `Trie`, `SuffixTree`	Generic structures used by match finders and grammar coders.
`Compression.Core.Deflate`	`DeflateEncoder`, `DeflateDecoder`, `Zopfli`, `MsZip`	DEFLATE pipeline plus Zopfli's iterative ratio-optimisation pass and Microsoft's MS-ZIP variant.
`Compression.Core.Dictionary.*`	One namespace per dictionary algorithm	LZ77 / LZ78 / LZW / LZSS / LZ4 / Snappy / Brotli / LZMA / LZX / XPRESS / LZH / ARJ / LZMS / LZP / ACE / RAR5 / SQX / LZO1X / LZHAM / LZS / LZWL / Re-Pair / 842, plus building-block variants (LZJB / LZF / LZRW1 / LZRW3 / FastLZ / Lizard / QuickLZ / Pithy / Crush / LZG / LZRLE / LZMAT / Shoco / DnaCompression).
`Compression.Core.Entropy.*`	One namespace per entropy coder	Huffman (canonical) / Shannon-Fano / Arithmetic / Range / FSE / rANS / PPMd / context-mixing (PAQ8 / cmix / MCM / BCM / BSC / BALZ / CSC / Zling / Lizard / QuickLZ-as-entropy) / Golomb / Rice / Elias Gamma / Elias Delta / Fibonacci / Tunstall / Exp-Golomb / Unary / Levenshtein / DMC.
`Compression.Core.Transforms`	`Bwt`, `Mtf`, `Rle`, `Delta`, `Bcj`, `Bcj2`, `Bpe`, `Dpcm`, `RePair`	Reversible byte-rearrangement passes — staged before entropy coding. BCJ family is platform-specific branch-relative-to-absolute conversion (x86 / ARM / Thumb / PPC / SPARC / IA64).
`Compression.Core.Simd`	`SimdMatchLength`, `SimdMemCopy`, `SimdHistogram`, `SimdRunScan`	Vector-accelerated primitives for hot match-finding paths.
`Compression.Core.Streams`	`SubStream`, `ConcatStream`, etc.	Lightweight stream shims commonly needed when writing format readers.
`Compression.Core.DiskImage`	`MbrParser`, `GptParser`, `PartitionTypeDatabase`, `PartitionEntry`	Header-only parsers for partition tables — handy when dealing with disk-image archives.
`Compression.Core.Progress`	`IProgress<>`, `Progress` helpers	Cancellable progress reporting for long compressors.

Versioning

Versions follow MAJOR.MINOR.PATCH.BUILD where BUILD is git rev-list --count HEAD of the source repo, auto-stamped at release time by scripts/version.pl. The package is pre-1.0 — public surface may rearrange between releases. Once we ship 1.0.0, breaking changes follow SemVer (MAJOR bump).

If you need a pinned reference, lock the exact 4-segment version:

<PackageReference Include="Compression.Core" Version="[1.0.0.123]" />

Source, tests, and benchmarks

Lives in Hawkynt/CompressionWorkbench. The source repo carries:

A NUnit test suite (~10 000 cases including round-trip sweeps for every building block, the cwb benchmark matrix, and an external-tool interop suite that round-trips through gzip / 7z / xz etc.)
A WPF UI (Compression.UI) with a benchmark visualiser and an archive browser
A CLI (cwb) with compress / extract / benchmark / analyse / auto-extract / suggest subcommands
~215 archive / file-system / image-format readers built on top of these primitives

Contributing

Open an issue or PR on the source repo. The CI runs the full test matrix on Windows and Linux and gates merges on a green test run.

License

LGPL-3.0-or-later. See LICENSE in the source repo.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hawkynt.Compression.Core

When to use this package

Quick start

Algorithm round-trip

Hashing

Bit I/O

Discovering building blocks

Choosing the right algorithm

Building blocks reference

Module overview

Versioning

Source, tests, and benchmarks

Contributing

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Hawkynt.Compression.Core

When to use this package

Quick start

Algorithm round-trip

Hashing

Bit I/O

Discovering building blocks

Choosing the right algorithm

Building blocks reference

Module overview

Versioning

Source, tests, and benchmarks

Contributing

License