Archive file viewer

Administrative:

Shortcuts:
- GUI = graphical user interface
- CMD = bash commands
- HUF = Huffaman trees functions
- LZW = Lempel-Ziv-Welch functions
Map of execution pseudocode:

choose gui / bash boot
choose option, algorithm and paths
   if compression 
      create a temporary tar file (encoded in ASCII)
      compression is done
      delete temporary tar file
      place the encode file
   if decompression

Data structures:
- Heap
- Binary trees
- Priority Queue
- Dictionary
- Hash Table
Workspace:
- admin
- usage
- constant files/test files - exemples of execution
Principles:
- the path to a folder always contains at the end of the string "/"
- GUI-function has two components: visual method, functional method
- object-oriented programming
Wireframes (screens):

menu -> explorer -> menu + status box test (type string) -> output shown -> menu + status box info (info-choose) -> TAR principle -> adaptation -> pseudocode -> exemples -> menu -> HUF algorithm -> adaptation -> pseudocode -> exemples -> menu -> LZW algorithm -> adaptation -> pseudocode -> exemples -> menu

I. Boot:

two ways of running the program:
- double-click on start.exe => opens GUI
- openning the folder in command prompt
  - type ./name_of_executable *arguments*
  - if no arguments provided => opens GUI

1. GUI

menu => file explorer / more information / test algorithms with a string
explorer stages:
1. choose partition
2. choose files (displays: buttons, curent path, files, selected_files)
3. status box
buttons:

Index		Class			Name		What it does
-------------------------------------------------------------------------------------------------
0 		button_0		exit		force the app to close
1 		button_menu_0		stored files	opens the file explorer
2 		button_menu_1   	+information	print exemples and 
3 		button_menu_2		test strings	convert keyboard input into huf/lzw code
4		button_expl_static	option		compression / decompression
5		button_expl_static	algorithm	HUF / LZW
6		button_expl_0           select		prepare files for option

2. CMD

Usage: ./name_of_executable <operation> <algorithm> <nr_input_files> <input_file_paths> <output_path>
multiple files can be selectected to be encoded and decoded in one step
some exemples:

.boot.exe decompress HUF 1 files/encoded/unpack/huf files/decoded/ originalFiles

./bin/Debug/BetterWR.exe compress LZW 2 '/constant files/test files/to_encode/element_1/' '/constant files/test files/to_encode/element_2/' unpack

II. Algorithms:

1. HUF algorithm

Huffman encoding involves computing the frequency for each unique character that appears in the initial string
each character and its corresponding frequency is stored in a map => build the Huffman tree
Huffman tree necessary for both encoding and decoding
based on the tree, we convert the initial string into a string of binary values - grouped such that a byte(8 bits) can be created and converted into the corresponding ASCII character.
finally, the resulting string is encoded and ready to be written to the output file. Also in this file we will keep the information needed to reconstruct the tree, i.e. each character together with its frequency plus the size of the tree. When we read the contents of the file to perform the decoding, we make sure that we first read the tree data correctly and then read the encoded string. Decoding will be done by traversing the reconstructed tree, and when the encoded value corresponding to a character is found, we will write it to our created decoding string.
encoding/decoding pseudocode:

Creation of a Priority Queue Q, which contains each character
Sorting in ascending order of their frequencies 
For all unique characters:
    Create a newNode
    Extraction of the minimum value from Q and assigning it to the leftChild of the newNode 
    Retrieving the minimum value from Q and assigning it to the right child rightChild of newNode
    Computing the sum of the two minimum values and assigning this value to the newNode
    Inserting the newNode into the tree
return root

2. LZW algorithm

Lempel-Ziv-Welch algorithm is a lossless data compression algorithm. Abraham Lempel, Jakob Ziv and Terry Welch are the scientists who developed it. Procedure: it scans a file for data patterns that occur multiple times, which are saved later in a dictionary and their references are placed in a compressed file whenever repetitive data occurs. In hardware implementations, the algorithm is simple and has the potential for very high throughput. Other usages: GIF image format, Unix file compression utility.
the dictinonary is initialised with all the char elements from 0 to 255.
dictionary's memory + compressed message's memory < uncompressed message's memory
dictionary is not send with the compressed message - you build the dictionary along the way (the receiver will constructor decoder alone)
ideal for repetitive and long strings
no prior information about the input data stream
compress the input stream in one single step
fast execution
enconding pseudocode:

repeat
	repeat
		if temp is in dictionary
		    add the following character
	until it is no longer in the dictionary
	add it to the dictionary
	output the part that was already in the dictionary 
	//((strlen(temp)>1) means it is in the dictionary - memory saving method)
	
	reset variables 
until they reach the end of the input

basic principle: the first character is always in the dictionary and the second character if it is not, means it is a copy of the previous one
decode pseudocode:

repeat
    if not in dictionary next
	    copy previous
    else
	    we put it in the analysis on next
	output<<temp
    add to dictionary(previous, current)
until final input

III. Packaging:

1. Tar format (used in my case)

tape archive file format is an UNIX-based utility
connects files (tarballs) without compression - often compressed with gzip or bzip2.
data encoded in ASCII
each file object includes its data and a 512-byte record-anthet. Files' data is written unmodified, except that its length is rounded to a multiple of 512 bytes. At the end of the archive file there are two 512-byte blocks, filled with binary zeros, to mark the end of the file. To ensure portability between different architectures with different byte orderings, the information in the header record is encoded in ASCII. TAR files are fully compatible between UNIX and Windows systems because all header information is represented in ASCII.
Pros and cons:
- we can use any compression algorithm after the unification
- deletes doubled information (files)
- can lose metadata of files (index of files)
how content of the tar.bin file will look like:

element_1//
	forth.txt

patru s-a luat
cuci s-au aruncat
hahalero, kokos s-a diluat
am ster so mor decaptat

*5EPArAToR*

	lyrics/
		1.txt

hello world
pleasure to meet youu

*5EPArAToR*

	end_oF_lyrics/
end_oF_element_1//

2. Zip format

.zip extension is an archive that can hold files. The archive may have compression applied to the included files. As soon as the then existing specifications were made available, many companies made the ZIP file format part of their software utilities, including Microsoft (since Windows 7), Apple (Mac OS X) and many others.
the file is a compressed archive that supports lossless data compression. It is often used to send zipped email attachments; this way the message cannot be blocked by email server filters. It can also be used to hide a file type or prevent it from being opened.
the order of events: firstly individual compression of the files then the unification
pros and cons:
- you can view and unarchive only one archive item
- parallelize the archiving process on the CPU -> faster
- compression ratio -- (join duplicates, something that is redundant)

3. Binary files

images: jpg, png, gif, bmp.
videos: mp4, mkv, avi, mov.
audio: mp3, aac, wav, flac.
documents: pdf, doc, xls.
archive: zip, rar, 7z, tar.
executable: exe, dll, so.

4. Text files

web standards: html, xml, css, svg, json, ...
source code: c, cpp, h, cs, js, py, java, rb, pl, php, sh, ...
documents: txt, tex, markdown, asciidoc, rtf, ps, ...
configuration: ini, cfg, rc, reg, ...

Source

IV Limitations:

Small number for input paths
From the graphical point of view:

graphics.h is a primitive library Additional documentation

32-bit compiler - it was used such that the project could support graphics.h

V Future improvements:

first alpha version launch: 10 june 2024

Add more languages:

english
romanian
french

use temp.tar as a binary file s. t. binary files coud be read. !!!
concrete implementation of the following menu buttons:

"+information"
"with strings"

navigation of the explorer with keyboard

VI How to set up this project in CodeBlocks

Have g++ follow at least the C++17 GNU C++ language standard (ISO C++ plus GNU extensions)
Install a 32-bit compiler and set it up
Install graphics.h

set up linker settings: -lbgi -lgdi32 -lcomdlg32 -luuid -loleaut32 -lole32

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
admin		admin
bin/Debug		bin/Debug
constant files		constant files
obj/Debug		obj/Debug
usage		usage
app.cbp		app.cbp
app.depend		app.depend
app.layout		app.layout
readMe.md		readMe.md
start.exe		start.exe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Archive file viewer

Administrative:

I. Boot:

1. GUI

2. CMD

II. Algorithms:

1. HUF algorithm

2. LZW algorithm

III. Packaging:

1. Tar format (used in my case)

2. Zip format

3. Binary files

4. Text files

IV Limitations:

V Future improvements:

VI How to set up this project in CodeBlocks

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Archive file viewer

Administrative:

I. Boot:

1. GUI

2. CMD

II. Algorithms:

1. HUF algorithm

2. LZW algorithm

III. Packaging:

1. Tar format (used in my case)

2. Zip format

3. Binary files

4. Text files

IV Limitations:

V Future improvements:

VI How to set up this project in CodeBlocks

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages