A fully clean-room C# implementation of compression primitives, archive file formats, and analysis tools. Every algorithm is implemented from scratch using no external compression source code — only our own primitives.
Install via NuGet — pick the surface you need:
dotnet add package Hawkynt.Compression.Core # primitives only
dotnet add package Hawkynt.FileFormats.Audio # + audio codecs / containers
dotnet add package Hawkynt.FileFormats.Archives # + zip / tar / 7z / and the long tail
dotnet add package Hawkynt.FileFormats.FileSystems # + FAT / ext / NTFS / VHD / VMDK / etc.Use the CLI — cwb is a self-contained single-file executable, no .NET runtime needed:
cwb list mystery.bin # auto-detect format and list contents
cwb extract photos.tar.gz # auto-detect compression chain + extract
cwb analyze unknown.bin # entropy heatmap + signature scan + trial decompression
cwb benchmark sample.txt # compare every building block on your data
cwb auto-extract sample.vhd --recursive # disk → partition → filesystem → fileRead the per-package format reference to find out what's actually supported, audited against the real source code (R / WORM / R/W states, upstream spec links, limitations): Archives · Audio · FileSystems · Building blocks.
CompressionWorkbench exists to answer two kinds of questions about compressed and packaged data, entirely in managed .NET with no native dependency on zlib, liblzma, libarchive, or any other third-party compression library:
- "What is this, and what is inside?" — given an arbitrary blob of bytes, identify the format, slice it into its logical payloads, and recover the original data.
- "How does the algorithm work, and how does it compare?" — provide a reference implementation of every major compression primitive, from LZ77 through arithmetic coding to modern neural / context-mixing compressors, so the algorithms can be read, benchmarked, and taught from a single codebase.
Concretely that means:
- Clean-room, from-scratch C#. Every primitive — bit I/O, Huffman, range coding, LZ family, BWT/MTF, PPM, context mixing, modern ANS/FSE — is written from the original specification or from a clean reverse of the reference algorithm. No line of native compression code is linked in or ported.
- Every common container, read and written wherever a spec exists to write against honestly. When the writer cannot match an external spec (proprietary element streams, missing on-disk structures), that is documented in the support tables instead of shipping a silent toy.
- Every multi-payload container treated as an archive. The distinction that matters to a user is "can I list and extract the N things inside?", not "is this called ZIP". That makes PE resource DLLs, multi-page TIFFs, font collections, multi-frame GIFs, PSD layer stacks, and MPEG transport streams all first-class archives — see Archives and Pseudo-archives below.
- Analysis as a first-class surface. Identification, entropy mapping, trial decompression, chain reconstruction, signature scanning, and cross-validation against external tools are exposed through a library (
Compression.Analysis), a CLI (cwb), and a UI visualiser — not as an afterthought. - Benchmarking at the primitive level. The benchmark compares the building blocks — raw algorithms without container overhead — so ratio/speed numbers reflect the algorithm, not the envelope.
- One library, many surfaces. CLI archiver (
cwb), UI browser + analyser, Explorer shell integration, self-extracting stubs (Compression.Sfx.*), and a library any .NET consumer can link.
Any format that packages N discrete, separately-addressable payloads is an archive.
A format earns archive treatment — the IArchiveFormatOperations contract (List / Extract / optional Create) — whenever its binary layout contains:
- A directory or index of named or indexed entries, and
- Each entry can be extracted as an independent blob, and
- A consumer might plausibly want one entry without the others.
This is true regardless of whether the entries happen to be files, images, pages, frames, tracks, layers, tables, fonts, strings, or other domain objects. The contents of an extracted blob remain domain-specific (a TIFF page is still a TIFF, an RT_ICON resource is still an icon), but that is a property of the payload, not of the container.
Formats in the canonical archive sense — ZIP, TAR, 7z, RAR, CAB, CPIO, and their relatives. They were designed as "a bag of files with a directory". The exhaustive per-format reference (extensions, R / WORM / R/W state, upstream spec link, limitations) is in Hawkynt.FileFormats.Archives/README.md.
Formats that are archives by structure but have never been presented that way in ordinary file managers. CompressionWorkbench slices each one along its natural payload boundary and exposes the same List / Extract surface as ZIP.
State columns audited against actual IArchiveCreatable / IArchiveModifiable implementation, not advertised intent. Where one bullet covers multiple projects with different states (e.g. ICO is WORM, ANI is R), each project's state is shown explicitly.
| Container | State | Entries become | Where shipped |
|---|---|---|---|
| PE resource DLLs/EXEs | PeResources=R, ResourceDll=WORM | one entry per resource: RT_GROUP_ICON → .ico, RT_BITMAP → .bmp, RT_MANIFEST → .xml, RT_STRING → .txt, RT_VERSION → .rcv, raw RT_RCDATA |
FileFormat.PeResources, FileFormat.ResourceDll |
| ICO / CUR / ANI | Ico=WORM, Ani=R | one entry per ICONDIRENTRY → .png / .bmp (cursor adds hotspot) |
FileFormat.Ico, FileFormat.PngCrushAdapters.Ani |
| Multi-page TIFF / BigTIFF | sibling-provided | one single-page .tif per IFD |
FileFormat.PngCrushAdapters.Tiff / BigTiff |
| Multi-frame GIF / MNG / FLI / DCX | sibling-provided | one .gif / .png per frame |
FileFormat.Gif, PngCrushAdapters.{Mng,Fli,Dcx} |
| Animated PNG (APNG) | sibling-provided | one .png per frame with dispose/blend applied against previous frames |
FileFormat.PngCrushAdapters.Apng |
| Icon containers (ICNS, MPO) | sibling-provided | Apple icon suite / stereoscopic JPEG pair | FileFormat.PngCrushAdapters.{Icns,Mpo} |
| Font collections (TTC / OTC) | R | one .ttf / .otf per member font |
FileFormat.FontCollection |
| Single-font (TTF / OTF) | R | per-glyph entries (cmap + glyf slicing; CFF/OpenType passes through) | FileFormat.FontCollection.Ttf |
| Gettext MO / PO | R | one .txt per msgid/msgstr pair |
FileFormat.Gettext |
| WAV / FLAC / MP3 | WAV=WORM, FLAC=WORM, MP3=WORM | full file + per-channel WAV + ID3v2/RIFF metadata + APIC cover art | FileFormat.Wav, FileFormat.Flac, FileFormat.Mp3 |
| Ogg | R | per-logical-stream packets + Vorbis/Opus comments | FileFormat.Ogg |
| MP4 / MOV / MKV / WebM | R | demuxed tracks (H.264 → Annex-B), attachments, chapters | FileFormat.Mp4, FileFormat.Matroska |
| MPEG Transport Stream | R | per-PID elementary streams (video/audio/data) | FileFormat.MpegTs |
| Blu-ray PGS (SUP) | R | subtitle segments grouped by epoch | FileFormat.Sup |
| VobSub (DVD) | R | .idx metadata + per-entry slices of the sibling .sub PES stream |
FileFormat.VobSub |
| HLS M3U8 | R | segment list with per-variant metadata | FileFormat.M3u8 |
| U-Boot uImage, FDT/DTB, UEFI FV | R | firmware header metadata + decompressed payload or per-FFS/property entries | FileFormat.UImage, FileFormat.Dtb, FileFormat.UefiFv |
| Device executable packers | R | the packer's metadata.ini (detection evidence) + packed_payload.bin (or in-process decompressed body for UPX) |
FileFormat.ExePackers |
Formats that cannot produce multiple addressable entries stay in FormatCategory.Stream rather than falsely advertising themselves as archives. IArchiveFormatOperations.List is free to return a single "whole payload" entry for stream-style containers (and does, for formats like PAQ8 or the audio-stream-as-archive descriptors), but a format that would have to fake an index has no business claiming SupportsMultipleEntries.
The solution uses the .slnx XML format. Core / library / tooling projects sit at the
repository root; the ~360 individual format projects are grouped into three subdirectories
by domain. Three meta-package projects bundle them into NuGet drops.
CompressionWorkbench.slnx
|
+-- Compression.Core Primitives, building blocks, SIMD, partition parsers
+-- Compression.Registry Interfaces (IFormatDescriptor, IBuildingBlock) + registries
+-- Compression.Registry.Generator Roslyn source generator for auto-discovery
+-- Compression.Lib Umbrella library: detection, archive ops, SFX hosting
+-- Compression.Analysis Binary analysis engine (signatures, entropy, trial decomp)
+-- Compression.CLI `cwb` command-line tool (System.CommandLine v3)
+-- Compression.UI WPF browser + analyser + heatmap + wizard
+-- Compression.Shell Explorer context-menu integration
+-- Compression.Sfx.Cli Self-extracting archive stub (console)
+-- Compression.Sfx.Ui Self-extracting archive stub (GUI)
+-- Compression.Tests NUnit test project
|
+-- Hawkynt.FileFormats.Audio/ Meta-package: bundles every Codec.* + audio FileFormat.*
+-- Hawkynt.FileFormats.Archives/ Meta-package: every archive / compression-stream / pseudo-archive
+-- Hawkynt.FileFormats.FileSystems/ Meta-package: every filesystem + disk-image container
|
+-- Codecs/Codec.*/ Standalone audio codecs (PCM / FLAC / A-law / μ-law / GSM /
| ADPCM / MIDI / MP3 / Vorbis / Opus / AAC)
+-- FileFormats/FileFormat.*/ One project per archive / stream / pseudo-archive / packer
+-- FileSystems/FileSystem.*/ One project per filesystem image format
Adding a new format is a four-step process:
- Create the project under the right bucket:
FileFormats/FileFormat.<Name>/for an archive / compression stream / pseudo-archive,FileSystems/FileSystem.<Name>/for a filesystem image, orCodecs/Codec.<Name>/for an audio codec. Add a class implementingIFormatDescriptorplus the appropriate operations interface (IStreamFormatOperations,IArchiveFormatOperations, plus optionallyIArchiveCreatablefor WORM, orIArchiveModifiablefor full R/W). - Add a
<ProjectReference>fromCompression.Lib.csprojso the source generator picks it up. - Add a
<ProjectReference PrivateAssets="all" />to the matching meta-package csproj (Hawkynt.FileFormats.{Audio,Archives,FileSystems}/Hawkynt.FileFormats.*.csproj) so the format ships in its NuGet meta-package and the meta README's reference table can include it. - Add the project to
CompressionWorkbench.slnx.
The Roslyn source generator (Compression.Registry.Generator) discovers every implementation
at compile time and emits the registration table. No reflection, no hand-maintained switch
statements, no init hooks.
For more detail on conventions, testing, and the registry mechanism, see CONTRIBUTING.md.
| Concern | Choice |
|---|---|
| Language | C# 14 / .NET 10 |
| Solution | .slnx (XML solution format) |
| Testing | NUnit |
| GUI | WPF |
| CLI | System.CommandLine v3 |
| Discovery | Roslyn source generator (zero-reflection format/block registration) |
| Bundling | Costura.Fody single-file embedding for CLI/UI/SFX |
The state shown for each format in the meta-package READMEs is audited against the actual
source code — IArchiveCreatable and IArchiveModifiable interface implementations,
FormatCapabilities.CanCreate / CanModify flags — not advertised intent.
| State | Meaning | Source-code signal |
|---|---|---|
| Unsupported | No descriptor exists. | — |
| R | Read-only: can List / Extract / Test; no creation. |
IArchiveFormatOperations only |
| WORM | Write-Once-Read-Many: can produce a fresh archive / image, but cannot modify in place. | IArchiveCreatable (or FormatCapabilities.CanCreate) |
| R/W | Can also add / replace / remove entries inside an existing archive with consistent free-space bookkeeping. | IArchiveModifiable (or FormatCapabilities.CanModify) |
The raw algorithm primitives registered via IBuildingBlock live in Hawkynt.Compression.Core
and are published as the foundation NuGet package. Three sister meta-packages add format
coverage on top — all version-locked 1:1, all built from the same source repo, all
single-responsibility:
A fifth sister package — Hawkynt.FileFormats.Images — lives in the
Hawkynt/PNGCrushCS sibling repo and supplies
image-format coverage (PNG / JPEG / TIFF / APNG / etc.). Together the five packages form one
cohesive surface without any one dragging in the others' transitive baggage.
Where each per-format detail table lives:
- Compression.Core/README.md — every building block (Dictionary / Entropy / Transform / Context-Mixing families) with reference papers and known-edge-case notes. The canonical "how / when / why" for picking a compression primitive.
- Hawkynt.FileFormats.Archives/README.md — the ~190 archive / stream / pseudo-archive descriptors with R / WORM / R/W state per format.
- Hawkynt.FileFormats.Audio/README.md — every codec
- container with its production / partial / framing-only state honestly documented.
- Hawkynt.FileFormats.FileSystems/README.md — filesystems, disk-image containers, firmware images, plus the WSL-validated external-tool matrix and the forensic-recovery carver API.
CompressionWorkbench exposes the same core library through five different surfaces. Pick the one that fits the task.
Universal archive tool with smart conversion, optimal re-encoding, benchmarking, and analysis built in.
| Command | Alias | What it does |
|---|---|---|
list <archive> |
l |
List contents of an archive |
extract <archive> [files...] |
x |
Extract files from an archive |
create <archive> <files...> |
c |
Create a new archive |
test <archive> |
t |
Test archive integrity |
info <archive> |
- | Show detailed archive information |
convert <input> <output> |
- | Convert between archive formats |
optimize <input> <output> |
opt |
Re-encode with optimal compression |
benchmark <file> |
bench |
Benchmark all building blocks on the supplied data |
analyze <file> |
- | Run binary analysis (detection + entropy + trial decompress) |
auto-extract <file> |
- | Recursive nested extraction (see below) |
batch <dir> |
- | Scan a directory in parallel and aggregate format stats |
suggest <file> |
- | Platform-aware format recommendation |
recover <image> |
- | Forensic carving — finds embedded filesystems + files in damaged disk images. --mode auto|filesystems|files, --recursive walks nested wrappers (e.g. ZIP→VHD→MBR→FAT). |
visualize <file> |
- | Renders a colored block map of every detected envelope (FAT/ext/NTFS/MBR/...) stacked by depth. --format ascii|svg|html |
carve <file> |
- | Photorec-style file carver (JPEG/PNG/MP4/ZIP/... at any offset, including in slack space) |
reverse-engineer <tool> |
reveng |
Black-box probing of an unknown compression tool |
tool (init|list|add|run|remove) |
- | Manage external-tool templates |
formats |
- | List all supported formats |
Examples:
cwb list archive.zip
cwb extract archive.7z -o ./output
cwb x archive.rar -p mypassword
cwb create output.zip myDir file1.txt *.txt
cwb create output.7z file.txt --method lzma2+
cwb convert input.tar.gz output.tar.xz
cwb optimize input.zip optimized.zip
cwb benchmark largefile.bin
cwb analyze unknown.bin
cwb auto-extract sample.vhd --recursive
cwb suggest big.csv # "→ consider zstd -19 (columnar/text, moderate entropy)"3-tier conversion model. cwb convert picks the cheapest strategy that preserves data:
| Tier | Strategy | Example |
|---|---|---|
| 1 | Bitstream transfer (zero decompression) | .gz ↔ .zlib, .zip ↔ .gz |
| 2 | Container restream (decompress wrapper only) | .tar.gz → .tar.xz |
| 3 | Full recompress (extract + re-encode) | .zip → .7z |
Method+ system. Append + to any method name for optimal encoding: deflate+ uses Zopfli, lzma+ uses Best, lz4+ uses HC.
Tool templates. cwb tool registers external CLI tools (7z, binwalk, file, trid, …) in ~/.cwb-tools.json. Templates use {input}, {output}, {outputDir} placeholders and can capture stdout, pipe stdin, or set a timeout. cwb tool init pre-populates templates for common tools.
The archive browser is the conventional half: file list with icons, columns (name, size, compressed, ratio, method, modified), open / extract / create / test flows, preview window (text + hex), properties dialog with compression-ratio visualisation, benchmark tool, and Explorer context-menu integration (Compression.Shell).
UI niceties that match power-user expectations from 7-Zip / Total Commander:
- ".." everywhere — navigates up one folder; at archive root it exits to OS-browser mode rooted at the archive's containing folder, so you can keep walking up the filesystem like 7z does.
- Auto-descent into nested archives — double-clicking a file inside an archive that's itself an archive (e.g. a
.vhdinside a.zip) opens it as a new archive context. ".." pops back to the parent. Guarded by content-hash dedup + max-depth-16 cap so a malformed file detected as containing itself doesn't loop forever. - Drag in / drag out — drop files on the window to open them or add them to the open archive (auto-detects); drag entries out of the list to copy them into Explorer or any drop target.
- Last-folder restore — relaunching the app reopens the OS browser at the last folder a file was opened from. If that folder was deleted in the meantime, walks up parents until one exists, falling back to
%USERPROFILE%. - All file-type filters — Open dialog dropdown lists "All Archives" + one entry per registered descriptor (auto-discovered, alphabetically sorted), so you can narrow to e.g. "ZIP archive (.zip)" or "VMDK virtual disk (.vmdk)" with one click.
The analyser is the interesting half. When you drop an unknown binary on the UI, it never says "unsupported" — it shows you what the bytes look like. The Binary Analysis wizard has a toolbar that walks you through progressively deeper investigation:
- Scan Results — every registered magic-byte signature that matches, with offsets and confidence.
- Fingerprints — algorithm identification from byte-distribution and byte-pair statistics.
- Entropy Map — per-region entropy profile with CUSUM change-point detection and 1D-Canny edge sharpening. Structured data (text, tables) shows low entropy; compressed/encrypted regions show high entropy; boundaries between them are marked.
- Trial Decompress — runs every registered stream decompressor in parallel with per-trial timeout and early-terminates on a low-entropy output. If any decoder produces plausible output, it is offered for preview.
- Chain — multi-layer compression reconstruction (e.g.
gzip(bzip2(data))). Recursive trial decompression continues until entropy stops dropping. - Statistics — full byte distribution, bigram histogram, chi-square randomness test, longest run, run-length distribution.
- Strings — ASCII / UTF-8 / UTF-16 string search with regex support.
- Structure — ImHex/010-style
.cwbttemplates. Built-in templates ship for ZIP, PNG, BMP, ELF, Gzip; you can write your own using u8-u64 / i8-i64 / f16-f64 (LE/BE), char/u8 arrays, BCD, fixed-point, color, date/time, and network types with dynamic length via field references or repeat-to-EOF.
The Heatmap Explorer is the visual first pass. A 16×16 colour grid represents a proportional region of the file. Each of the 256 cells is one tile.
| Cell colour | Meaning | Entropy |
|---|---|---|
| Blue | Low entropy — zeros, padding, simple headers | 0.0–3.0 |
| Green | Structured data — tables, records, text | 3.0–5.5 |
| Orange | Compressed data | 5.5–7.5 |
| Red | Random / encrypted (incompressible) | 7.5–8.0 |
| Purple | A known format signature was detected here | any |
Click any cell to subdivide into another 16×16 grid — it recursively zooms in on a region. Hovering shows offset, size, entropy, unique-byte count, and the detected signature (if any). Extract on a purple cell saves just that region to a file. The explorer only samples each block, so it handles arbitrarily large files without loading them into memory. Accessible from the analyser's "Heatmap" tab.
Everything the UI exposes is available as a .NET library under Compression.Analysis:
- Signature Scanner — magic-byte detection for every registered format (hash-indexed, O(n)).
- Algorithm Fingerprinting — statistical fingerprinting against known compression-output distributions.
- Trial Decompression —
TryAllAsyncruns every registered stream decompressor in parallel with per-trial timeout and early termination. - Chain Reconstruction — discovers layered compression.
- Entropy Mapping — per-region entropy profiling with boundary detection; multi-resolution entropy pyramid (64 KB / 8 KB / 1 KB / 256 B), CUSUM binary segmentation, KL-divergence + chi-square boundary validation, 1D-Canny edge sharpening.
- String Extraction — ASCII / UTF-8 / UTF-16 with regex.
- Structure Templates —
.cwbttemplate language. - Streaming Analysis — reads the first 64 KB for magic/header; computes entropy in 64 KB chunks; returns per-chunk entropy profiles for arbitrarily large files.
- Black-box tool integration —
ExternalToolRunner,ToolOutputParser,CrossValidator,FallbackDecompressorwith auto-discovery of tools onPATH. - AutoExtractor — recursive nested extraction: archives inside archives, disk images → partition tables → filesystems → files. Configurable max depth (default 5) and file-size limits.
- BatchAnalyzer — parallel directory scan with aggregate format statistics.
- FileCarver / FileCarverOutputSink — photorec-style flat magic-scan carving for damaged dumps. Streams 1 MB windows with 64 KB overlap; never materialises multi-GB images.
- FilesystemCarver / FilesystemExtractor — finds filesystem superblocks anywhere in a stream (ext at +1080, FAT at +54/+82, XFS "XFSB" at 0, Btrfs at 0x10020, …), validates each via the matching reader's
List(), extracts contents per-file with isolated error handling. - RecursiveFilesystemCarver — descends through wrapper chains: VHD → MBR → FAT → file.zip etc. Each
NestedHitcarries itsEnvelopeStacklineage so consumers know what's wrapping what. - BlockMap / BlockMapRenderer — colored visualization of envelope stacks. ASCII / SVG / HTML output with per-format palette (ext = green, FAT = orange, NTFS = blue, Btrfs = teal, XFS = red, MBR/GPT = grey, QCOW2/VMDK/VHD = purples). Used by
cwb visualize. - PayloadCarver, StringsExtractor, EntropyHeatmap — standalone helpers.
Detection pipeline. Magic bytes → parallel trial decompression (early-termination on low-entropy output) → extension fallback → deep probe (header parse + structural validation + integrity check).
Partition table support. MbrParser (four primaries at 0x1BE + extended/logical chain) + GptParser (EFI PART at LBA 1) + PartitionTypeDatabase (type-byte / GUID → filesystem name). Recursive descent via --recursive: disk image → partition table → filesystem → archive chain.
Two complementary flows for reverse-engineering unknown compression tools and file formats.
Black-box tool probing runs the target tool with ~40 controlled probe inputs (empty, single byte, incrementing patterns, text, random data, various sizes 0–64 KB), cross-correlates all outputs, and reports: magic bytes, size-field offsets (LE/BE, 2/4/8 byte), the compression algorithm (trial decompression against all 49 building blocks), filename storage (UTF-8 / UTF-16), determinism, payload entropy.
cwb reverse-engineer MyTool.exe "{input} {output}"
cwb reverse-engineer packer.exe "--pack {input} --out {output}" --timeout 10000The GUI offers the same via Tools → Reverse Engineer Format as a step-by-step wizard with progress reporting.
Static analysis mode works when you have archive files with known original content but no tool to run. StaticFormatAnalyzer accepts pairs of (original, archived) and locates where the content appears inside the archive — verbatim or compressed with any known building block — then infers header/footer structure, size fields, and compression algorithm without ever executing an external tool.
Explorer context-menu integration. Right-click any file to invoke cwb commands directly: list, extract, test, optimise.
Self-extracting archive stubs for console and GUI use. The stub is a normal cwb-style reader prepended to an archive overlay; running the resulting exe extracts in place. Used for single-file distributions via Costura.Fody.
The test suite includes three tiers of external validation beyond the standard self-round-trip tests.
Self round-trip. All formats that support both create and extract are tested by creating an archive, extracting it, and verifying the output matches the original. Runs as part of the normal dotnet test.
External tool interop (Category=EndToEnd). Verifies our output is readable by external tools and vice versa. Dynamic tool discovery via PATH and common install locations; gracefully skips when tools are unavailable. Covered: 7z, gzip, bzip2, xz, zstd, lz4, tar. Both directions are tested: create with our library → read with external tool, and vice versa.
dotnet test --filter "Category=EndToEnd".NET BCL interop. Verifies interoperability with System.IO.Compression (GZipStream, DeflateStream, BrotliStream, ZipArchive).
OS integration (Category=OsIntegration). Platform-specific tooling:
- Windows — PowerShell
Compress-Archive/Expand-Archive, Windowstar,certutil,Mount-DiskImage, DISM - Linux —
mtools(FAT),genisoimage(ISO),qemu-img(virtual disks),debugfs(ext4),cpio
Platform detection + Assert.Ignore means tests never fail due to missing prerequisites.
dotnet test --filter "Category=OsIntegration"Filesystem validation matrix. Compression.Tests/ExternalFsInteropTests.cs wires 18 filesystem-image tests against the tools below:
| Tool | Present? | Validates |
|---|---|---|
| 7-Zip (portable) | Bundled | NTFS, FAT, exFAT, ext, HFS, HFS+, ISO 9660, UDF, SquashFS, CramFS (list/extract) |
| qemu-img | Optional — install from https://qemu.weilnetz.de/w64/ | VHD, VMDK, QCOW2, VDI (info + check) |
| DISM | Windows built-in | WIM, VHD, ISO |
| chkdsk | Windows built-in (admin + mounted volume) | FAT, exFAT, NTFS |
| mtools | Optional — install from Cygwin | FAT (non-admin) |
| WSL + mkfs.* / fsck.* | Optional — wsl --install as admin + reboot |
ext / XFS / Btrfs / F2FS / JFS / ReiserFS / UDF / UFS |
| DOSBox-X + MS-DOS 6.0/6.2 | Opt-in — set CWB_MSDOS_DBLSPACE_BOOT_IMG |
DBLSPACE CVF (DBLSPACE /CHKDSK D:) — see Compression.Tests/Support/MsDosImageStaging.md |
| DOSBox-X + MS-DOS 6.22 | Opt-in — set CWB_MSDOS_DRVSPACE_BOOT_IMG |
DRVSPACE CVF (DRVSPACE /CHKDSK D:) — see Compression.Tests/Support/MsDosImageStaging.md |
| DOSBox-X + FreeDOS LiveCD | Auto (hash-pinned download) | FAT (CHKDSK D: from FreeDOS) — gate is [Explicit] because the LiveCD welcome screen races the autoexec |
Tests skip cleanly when the tool is missing; they never fail the suite on a tool-deficient machine.
Principles:
- No external compression code. Every algorithm is implemented from scratch in C#.
- Composable primitives.
Compression.Coreprovides the building blocks;FileFormat.*/FileSystem.*projects compose them.Compression.Corenever implements format interfaces — it is pure algorithm. - Stream-oriented. All compression / decompression operates on
System.IO.Stream. - Immutable headers. File-format header structures are immutable record types.
- Testability. Every component is independently testable; NUnit tests cover primitives, format round-trips, and external interop.
- .NET 10 / C# 14. Latest language features, nullable reference types, warnings-as-errors.
Registry. The source generator (Compression.Registry.Generator) emits a RegisterFormats() method listing every IFormatDescriptor and a FormatDetector.Format enum with one entry per format — zero reflection, zero hand-maintained lists. The same mechanism discovers IBuildingBlock implementations in Compression.Core.
dotnet build CompressionWorkbench.slnxdotnet test- RFCs: RFC 1951 (Deflate), RFC 1952 (Gzip), RFC 1950 (Zlib), RFC 7932 (Brotli), RFC 8878 (Zstandard)
- libxad — the external archive decompressor, format reference
- XADMaster / The Unarchiver — modern continuation of libxad
- libarchive — multi-format reference
- Wikipedia list of archive formats
- ArchiveTeam Just Solve The File Format Problem — compression format documentation
- 7-Zip — multi-archiver reference
- Matt Mahoney's data-compression page — context-mixing compressors + corpus