Pure-managed compression primitives extracted from CompressionWorkbench. Every algorithm is implemented from scratch — no native dependency on zlib, liblzma, libarchive, or any other third-party compression library. The package is single-purpose: it bundles only
Compression.Registry(the building-block + format-descriptor interfaces) directly intolib/, so consumers add a single dependency and get the full primitive surface. PCM audio, archive formats, and filesystem readers ship as separate single-responsibility packages — alongside the siblingHawkynt.FileFormats.Images— and are not pulled in transitively byHawkynt.Compression.Core:
Hawkynt.FileFormats.Audio— PCM / FLAC / WAV / AIFF / AU / Vorbis / Opus / AAC / MP3Hawkynt.FileFormats.Archives— ZIP / TAR / 7Z / RAR / and the long tailHawkynt.FileFormats.FileSystems— FAT / ext4 / NTFS / HFS+ / APFS / Btrfs / etc.
Reach for Compression.Core when any of these is true:
- You need a compression algorithm that's not in the BCL (LZW / LZMA / LZHAM / Brotli / Zopfli / FSE / rANS / PAQ8 / context-mixing / Burrows-Wheeler / arithmetic / range / Tunstall / DMC / etc.)
- You need a fully managed implementation (e.g. for AOT, deterministic-build, or environments where
binding against
libzlibis forbidden by security review) - You need fine-grained control: bit-level I/O, raw entropy stages, transform composition, custom match finders — building blocks compose freely
- You're benchmarking algorithms against each other and want a level playing field — every codec here
uses the same
byte[] Compress(ReadOnlySpan<byte>)/byte[] Decompress(ReadOnlySpan<byte>)shape
Skip it when:
- You only need DEFLATE / GZIP / Brotli / ZLib at default settings —
System.IO.Compressionis faster on hot paths because it ships with native code paths - You need format-level handling (ZIP / TAR / 7Z / RAR archives) — wait for
Hawkynt.CompressionWorkbench.Lib(sister package, not yet on NuGet) or use the source repo
Every building block exposes the same two-method shape via the
Compression.Registry.IBuildingBlock interface:
using Compression.Core.Dictionary.Lzw;
using Compression.Registry; // IBuildingBlock + BuildingBlockRegistry
IBuildingBlock lzw = BuildingBlockRegistry.GetById("BB_Lzw")!;
byte[] compressed = lzw.Compress(originalBytes);
byte[] roundTripped = lzw.Decompress(compressed);Or call the concrete codec directly when you want non-default parameters:
using var ms = new MemoryStream();
var encoder = new LzwEncoder(ms, minBits: 9, maxBits: 12);
encoder.Encode(originalBytes);
ms.Position = 0;
var decoder = new LzwDecoder(ms, minBits: 9, maxBits: 12);
byte[] roundTripped = decoder.Decode(originalBytes.Length);using Compression.Core.Hashing;
uint crc = Crc32C.Compute(data); // hardware-accelerated SSE4.2 / ARM CRC32C
uint xx32 = XxHash32.Compute(data);
ulong fnv = Fnv1a64.Compute(data);
byte[] sha256 = Sha256.Compute(data);using Compression.Core.BitIO;
using var ms = new MemoryStream();
var writer = new BitWriter(ms, BitOrder.MsbFirst);
writer.WriteBits(0b1011_0010, count: 8);
writer.WriteBits(0xF, count: 4);
writer.Flush();
ms.Position = 0;
var reader = new BitReader(ms, BitOrder.MsbFirst);
int byteVal = reader.ReadBits(8); // 0xB2
int nibble = reader.ReadBits(4); // 0xFforeach (var bb in BuildingBlockRegistry.All)
Console.WriteLine($"{bb.Id,-18} {bb.Family,-14} {bb.Description}");The benchmark tool in the source repo (cwb benchmark <file>) iterates this same list and ranks every
algorithm by ratio + compress / decompress wall time on a single input.
| You want… | Pick |
|---|---|
| Fastest possible round-trip on logs / JSON / mixed data | BB_Lz4, BB_Snappy, BB_Pithy, BB_Lzfx, BB_Lzo, BB_Lzjb |
| Best ratio with reasonable CPU budget (general purpose) | BB_Lzma, BB_Brotli, BB_Zstd-via-FSE |
| Best ratio at any CPU cost (research / archival) | BB_PAQ8, BB_Cmix, BB_Mcm, BB_Bsc, BB_Csc, BB_Zpaq |
| Streamable byte-level codec (network / pipe) | BB_Deflate, BB_Lz4, BB_Snappy |
| Lossless image-friendly | BB_Deflate (PNG layer), BB_Lzma (lossless), BB_Brotli (modern web) |
| Domain transform first, then entropy code | BB_Bwt → BB_Mtf → BB_Huffman (the bzip2 stack) |
| Adaptive entropy stage | BB_Arithmetic, BB_RangeCoding, BB_rANS, BB_FSE |
| Static entropy stage (precomputed table) | BB_Huffman, BB_Tunstall, BB_ShannonFano |
| Geometric integer distributions | BB_Golomb, BB_Rice, BB_ExpGolomb, BB_EliasGamma, BB_EliasDelta |
| Universal integer codes (no model) | BB_Unary, BB_Fibonacci, BB_Levenshtein |
| DNA / 4-symbol alphabets | BB_Dna (specialised) |
| Hardware-friendly chunked compression | BB_842 (IBM hardware-style 2 / 4 / 8-byte template matching) |
| Network-protocol-style framing (packet sizes) | BB_Lzs (Stac LZS — RFC 1967 / RFC 2395) |
| Templated grammar / repetition mining | BB_RePair, BB_Bpe, BB_Lzwl |
When in doubt: run cwb benchmark from the source repo on a representative sample of your data and let
the numbers pick.
Every entry below registers via IBuildingBlock, exposes both Compress and Decompress, and was
verified via the project's round-trip test matrix. Each row links to the algorithm's primary reference
or canonical specification.
| Id | Algorithm | Family | State | Description | Reference |
|---|---|---|---|---|---|
BB_Deflate |
DEFLATE | Dictionary | R/W | LZ77 + Huffman, the algorithm inside gzip / zip / png | RFC 1951 |
BB_Deflate64 |
Deflate64 | Dictionary | R/W | Enhanced DEFLATE with 64 KB window | MS-ZIP |
BB_Lz77 |
LZ77 | Dictionary | R/W | Sliding-window dictionary with distance / length tokens | Ziv & Lempel 1977 |
BB_Lz78 |
LZ78 | Dictionary | R/W | Phrase-dictionary precursor to LZW | Ziv & Lempel 1978 |
BB_Lzw |
LZW | Dictionary | R/W | Dictionary coding used in GIF / Unix compress |
Welch 1984 |
BB_Lzo |
LZO1X | Dictionary | R/W | Extremely fast, decompression-optimised | oberhumer.com |
BB_Lzss |
LZSS | Dictionary | R/W | LZ77 variant with flag-bit encoding | Storer & Szymanski 1982 |
BB_Lz4 |
LZ4 | Dictionary | R/W | Extremely fast LZ77-family block compression | LZ4 block format |
BB_Snappy |
Snappy | Dictionary | R/W | Fast LZ77-family compression (Google) | Snappy format |
BB_Brotli |
Brotli | Dictionary | R/W | LZ77 + Huffman with static dictionary (Google) | RFC 7932 |
BB_Lzma |
LZMA | Dictionary | R/W | LZ + Markov chain + range coding | 7-Zip LZMA SDK |
BB_Lzx |
LZX | Dictionary | R/W | LZ77 + Huffman used in CAB / CHM / WIM | MS-PATCH LZX |
BB_Xpress |
XPRESS Huffman | Dictionary | R/W | Windows XPRESS (NTFS / WIM / Hyper-V) | MS-XCA |
BB_Lzh |
LZH (LH5) | Dictionary | R/W ¹ | Lempel-Ziv with adaptive Huffman, used in LHA | LZH format doc |
BB_Arj |
ARJ | Dictionary | R/W ¹ | Modified LZ77 + Huffman from ARJ archives | ARJ technote |
BB_Lzms |
LZMS | Dictionary | R/W | LZ + Markov + Shannon delta (Windows WIM / ESD) | MS-XCA LZMS |
BB_Lzp |
LZP | Dictionary | R/W | Lempel-Ziv Prediction with context-based match prediction | Bloom 1996 |
BB_Ace |
ACE | Dictionary | R/W | LZ77 + Huffman from ACE archive format | unace-nonfree |
BB_Rar |
RAR5 | Dictionary | R/W | LZ + Huffman + PPM from RAR5 | rarlab technote |
BB_Sqx |
SQX | Dictionary | R/W ¹ | LZ + Huffman from the SQX archive format | SqxFormat notes |
BB_ROLZ |
ROLZ | Dictionary | R/W | Reduced-Offset LZ with context-based match tables | encode.su |
BB_LZHAM |
LZHAM | Dictionary | R/W | LZ77 + Huffman, inspired by LZHAM codec | LZHAM repo |
BB_Lzs |
LZS | Dictionary | R/W | Stac LZS (7 / 11-bit offset LZSS for networking) | RFC 1967 / RFC 2395 |
BB_Lzwl |
LZWL | Dictionary | R/W | LZW with variable-length initial alphabet from digram analysis | LZWL paper |
BB_RePair |
Re-Pair | Dictionary | R/W | Recursive Pairing, offline grammar-based compression | Larsson & Moffat 1999 |
BB_842 |
842 | Dictionary | R/W | IBM 842 hardware compression with 2 / 4 / 8-byte template matching | Linux crypto/842* |
BB_PPM |
PPM | Context-Mixing | R/W | Prediction by Partial Matching, order-2 context modelling | Cleary & Witten 1984 |
BB_CTW |
CTW | Context-Mixing | R/W | Context Tree Weighting — optimal universal compression | Willems 1995 |
BB_Huffman |
Huffman | Entropy | R/W | Optimal prefix-free entropy coding | Huffman 1952 |
BB_Arithmetic |
Arithmetic | Entropy | R/W | Order-0 arithmetic coding with frequency table | Witten / Neal / Cleary 1987 |
BB_ShannonFano |
Shannon-Fano | Entropy | R/W | Recursive frequency-splitting precursor to Huffman | Shannon 1948 |
BB_Golomb |
Golomb / Rice | Entropy | R/W | Optimal coding for geometric distributions | Golomb 1966 |
BB_Fibonacci |
Fibonacci | Entropy | R/W | Universal code with 11 terminators (Zeckendorf) |
Apostolico & Fraenkel 1987 |
BB_FSE |
FSE / tANS | Entropy | R/W | Table-based ANS (Zstd's entropy stage) | Duda 2013 / Collet's blog |
BB_BPE |
Byte Pair Encoding | Entropy | R/W | Iterative most-frequent-pair replacement | Gage 1994 |
BB_RangeCoding |
Range coding | Entropy | R/W | Byte-oriented arithmetic coding with carryless normalisation | Martin 1979 |
BB_rANS |
rANS | Entropy | R/W | Range ANS, used in AV1 and LZFSE | Duda 2013 |
BB_ExpGolomb |
Exp-Golomb | Entropy | R/W | Exponential Golomb, used in H.264 / H.265 | Teuhola 1978 |
BB_Unary |
Unary | Entropy | R/W | Simplest universal code: N ones followed by a zero | — |
BB_EliasGamma |
Elias gamma | Entropy | R/W | Universal code using a unary length prefix | Elias 1975 |
BB_EliasDelta |
Elias delta | Entropy | R/W | Gamma-codes the bit length | Elias 1975 |
BB_Levenshtein |
Levenshtein | Entropy | R/W | Self-delimiting universal code with recursive length prefixing | Levenshtein 1968 |
BB_Tunstall |
Tunstall | Entropy | R/W | Variable-to-fixed code, dual of Huffman | Tunstall 1967 (PhD thesis) |
BB_Dmc |
DMC | Entropy | R/W | Dynamic Markov Compression, bit-level FSM with state cloning | Cormack & Horspool 1987 |
BB_Bwt |
BWT | Transform | R/W | Burrows-Wheeler Transform, reorders bytes for better compression | Burrows & Wheeler 1994 |
BB_Mtf |
MTF | Transform | R/W | Move-to-Front Transform | Bentley et al. 1986 |
BB_Delta |
Delta | Transform | R/W | Delta filter, stores differences between consecutive bytes | — |
BB_Rle |
RLE | Transform | R/W | Run-Length Encoding | — |
BB_Dpcm |
DPCM | Transform | R/W | Differential PCM, sample-to-sample differences | — |
¹ Known edge cases on certain pathological inputs — see the source repo's known-fixes log. The mainline round-trip works for all real-world data we've tested.
| Module | Surface | What's in it |
|---|---|---|
Compression.Core.BitIO |
BitReader, BitWriter, BitBuffer, BitOrder |
LSB / MSB-first bit streams over Stream. Foundation for every entropy / variable-length coder in the package. |
Compression.Core.Hashing |
Crc32, Crc32C, Adler32, XxHash32, Fletcher16/32/64, MurmurHash3, Fnv1 / Fnv1a, Sha1, Sha256, Md5, Blake2b |
Non-cryptographic checksums and a few crypto hashes. CRC-32C uses SSE4.2 / ARM CRC32C instructions where available, with a slicing-by-4 software fallback. |
Compression.Core.Crypto |
AesCbc, AesCtr, Blowfish, Pbkdf2, ZipCrypto |
Format-required ciphers (ZipCrypto for legacy ZIP, AES-256 for ZIP-AE). |
Compression.Core.DataStructures |
SlidingWindow, PriorityQueue, Trie, SuffixTree |
Generic structures used by match finders and grammar coders. |
Compression.Core.Deflate |
DeflateEncoder, DeflateDecoder, Zopfli, MsZip |
DEFLATE pipeline plus Zopfli's iterative ratio-optimisation pass and Microsoft's MS-ZIP variant. |
Compression.Core.Dictionary.* |
One namespace per dictionary algorithm | LZ77 / LZ78 / LZW / LZSS / LZ4 / Snappy / Brotli / LZMA / LZX / XPRESS / LZH / ARJ / LZMS / LZP / ACE / RAR5 / SQX / LZO1X / LZHAM / LZS / LZWL / Re-Pair / 842, plus building-block variants (LZJB / LZF / LZRW1 / LZRW3 / FastLZ / Lizard / QuickLZ / Pithy / Crush / LZG / LZRLE / LZMAT / Shoco / DnaCompression). |
Compression.Core.Entropy.* |
One namespace per entropy coder | Huffman (canonical) / Shannon-Fano / Arithmetic / Range / FSE / rANS / PPMd / context-mixing (PAQ8 / cmix / MCM / BCM / BSC / BALZ / CSC / Zling / Lizard / QuickLZ-as-entropy) / Golomb / Rice / Elias Gamma / Elias Delta / Fibonacci / Tunstall / Exp-Golomb / Unary / Levenshtein / DMC. |
Compression.Core.Transforms |
Bwt, Mtf, Rle, Delta, Bcj, Bcj2, Bpe, Dpcm, RePair |
Reversible byte-rearrangement passes — staged before entropy coding. BCJ family is platform-specific branch-relative-to-absolute conversion (x86 / ARM / Thumb / PPC / SPARC / IA64). |
Compression.Core.Simd |
SimdMatchLength, SimdMemCopy, SimdHistogram, SimdRunScan |
Vector-accelerated primitives for hot match-finding paths. |
Compression.Core.Streams |
SubStream, ConcatStream, etc. |
Lightweight stream shims commonly needed when writing format readers. |
Compression.Core.DiskImage |
MbrParser, GptParser, PartitionTypeDatabase, PartitionEntry |
Header-only parsers for partition tables — handy when dealing with disk-image archives. |
Compression.Core.Progress |
IProgress<>, Progress helpers |
Cancellable progress reporting for long compressors. |
Versions follow MAJOR.MINOR.PATCH.BUILD where BUILD is git rev-list --count HEAD of the source
repo, auto-stamped at release time by scripts/version.pl. The package is pre-1.0 — public
surface may rearrange between releases. Once we ship 1.0.0, breaking changes follow SemVer
(MAJOR bump).
If you need a pinned reference, lock the exact 4-segment version:
<PackageReference Include="Compression.Core" Version="[1.0.0.123]" />Lives in Hawkynt/CompressionWorkbench. The source repo carries:
- A NUnit test suite (~10 000 cases including round-trip sweeps for every building block, the
cwb benchmarkmatrix, and an external-tool interop suite that round-trips throughgzip/7z/xzetc.) - A WPF UI (
Compression.UI) with a benchmark visualiser and an archive browser - A CLI (
cwb) withcompress/extract/benchmark/analyse/auto-extract/suggestsubcommands - ~215 archive / file-system / image-format readers built on top of these primitives
Open an issue or PR on the source repo. The CI runs the full test matrix on Windows and Linux and gates merges on a green test run.
LGPL-3.0-or-later. See LICENSE in the source repo.