This repo contains Canonical Huffman code implementation from scratch. This can be used to compress files containing ASCII characters.
In the project directory you can find the following:
- src - directory with all sources for compiling:
- main.c - entry point, contains logic to start the application,
- app - contains state machine for compress and decompress data,
- arg_parser - parser for program options,
- huffman - contains main logic for Huffman encoding:
- huffman.c - contains logic for building Huffman Tree, Huffman Codes and call Canonical Huffman,
- huffman_utils - contains logic for frequency table,
- canonical_huffman - contains logic for Canonical Huffman codes,
- io_utils - contains utils for parsing program options, read and write metadata and data to file.
 
 
- Read data and building Frequency table:
- read chunk of data,
- update frequency table, which is implemented as LUT (char array of size 128 for accessing char frequency in O(1) time),
- repeat until EOF is reached.
 
- Build Huffman Tree and assign Huffman Codes:
- create min-heap from Frequency table based on character frequency (exclude zero frequency char),
- build Huffman Tree with min-heap to select the two minimal nodes and insert the new one in O(n*logn) time.
- assign initial Huffman codes.
 
- Canonical Huffman code:
- sort Huffman codes by code length,
- create Canonical Huffman table for codes as LUT,
- assign first character code 0 with length of Huffman code,
- increase next character code by 0b1,
- shift character code by difference in code length.
 
- Encode and save data:
- read chunk of data,
- encode chunk of data with Canonical Huffman table,
- save chunk of data to file,
- repeat until EOF is reached.
 
- Save metadata:
- transform Canonical Huffman table to format for saving,
- save count of shift bits, codes and number of significant bits for each code to .metadata.
 
- Read metadata:
- read count of shift bits,
- read Canonical Huffman codes and save as LUT.
 
- Read and decode data:
- read data in chunks,
- decode chunk of data with Canonical Huffman code,
- save decoded chunk of data,
- repeat until EOF is reached.
 
First compile source files into one executable file. Simply start Makefile script from project folder as follows:
make after compiling is completed, huffman executable file can be found in build/ directory.
To run main program the following command:
./build/huffmanalso don't forget to provide program options for input, output and operation.
To compress files or text by running main program you should also provide paths some arguments:
- use -i or --input for input file,
- use -o or --output for output file,
- use -c for compression,
- use -d for decompression.
NOTE: after data compression is finished successfully, metadata are saved in .metadata file.
NOTE: The output file is written in chunks. If the specified output file already exists, the data will be appended to the end instead of overwriting the file.
Example of input text can be found in input/ directory.
./build/huffman -i input/LoremIpsum.txt -o output/LoremIpsum.huff -cthis command will result in a compressed LoremIpsum.huff file of 3707 bytes instead of the original 6902 bytes:


./build/huffman -i output/LoremIpsum.huff -o output/LoremIpsum_decoded.txt -dthis command will result in a decompressed LoremIpsum_decoded.txt file with original text from LoremIpsum.txt.
