Skip to content

Maybe try another zip module #1241

@jimmywarting

Description

@jimmywarting

I've built a zero-dependency, streamable zip library using only web standards, meaning it works seamlessly across all environments: Node.js, Deno, Bun, and all browsers.

Since it doesn't depend on node:buffer, node:stream, or any Node-specific APIs, it's truly cross-environment friendly.
And if you would one day want this to work in even other backends other then NodeJS, then this would be a recommendation

One key benefit is that it operates on the W3C standard File API, making random access to uncompressed entries in a zip file incredibly fast—as simple as:

blob = zipBlob.slice(start, end)
text = blob.text()

This means ultra-fast performance because it leverages native browser APIs. 🚀
For compressed entries, it uses the standard DecompressionStream('gzip').


Here's a preview of how it could integrate into your codebase:

Code for writing zip files before:

vscode-vsce/src/package.ts

Lines 1825 to 1858 in 2aeafb2

function writeVsix(files: IFile[], packagePath: string): Promise<void> {
return fs.promises
.unlink(packagePath)
.catch(err => (err.code !== 'ENOENT' ? Promise.reject(err) : Promise.resolve(null)))
.then(
() =>
new Promise((c, e) => {
const zip = new yazl.ZipFile();
const zipOptions: Partial<yazl.Options> = {};
// reproducible zip files
const sde = process.env.SOURCE_DATE_EPOCH;
if (sde) {
const epoch = parseInt(sde);
zipOptions.mtime = new Date(epoch * 1000);
files = files.sort((a, b) => a.path.localeCompare(b.path))
}
files.forEach(f =>
isInMemoryFile(f)
? zip.addBuffer(typeof f.contents === 'string' ? Buffer.from(f.contents, 'utf8') : f.contents, f.path, { ...zipOptions, mode: f.mode })
: zip.addFile(f.localPath, f.path, { ...zipOptions, mode: f.mode })
);
zip.end();
const zipStream = fs.createWriteStream(packagePath);
zip.outputStream.pipe(zipStream);
zip.outputStream.once('error', e);
zipStream.once('error', e);
zipStream.once('finish', () => c());
})
);
}

What it could look like when writing zip files with this library:

// import * as yazl from 'yazl';
import ZipWriter from 'zip-go/lib/write.js'

function writeVsix(files: IFile[], packagePath: string): Promise<void> {
	return fs.promises
		.unlink(packagePath)
		.catch(err => (err.code !== 'ENOENT' ? Promise.reject(err) : Promise.resolve(null)))
		.then(async () => {
			// reproducible zip files
			const sde = process.env.SOURCE_DATE_EPOCH;
			files = sde ? files.sort((a, b) => a.path.localeCompare(b.path)) : files;

			const fileStream = ReadableStream.from((async function* () {
				const lastModified = sde ? parseInt(sde) * 1000 : Date.now();

				for (let file of files) {
					if ('contents' in file) {
						yield new File([file.contents], file.path, { lastModified });
					} else {
						const blob = await fs.openAsBlob(file.localPath);
						yield new File([blob], file.path, { lastModified });
					}
				}
			})());

			await fs.promises.writeFile(
				packagePath,
				fileStream.pipeThrough(new ZipWriter())
			);
		})
}

Code for reading zip files before:

vscode-vsce/src/zip.ts

Lines 8 to 48 in 2aeafb2

async function bufferStream(stream: Readable): Promise<Buffer> {
return await new Promise((c, e) => {
const buffers: Buffer[] = [];
stream.on('data', buffer => buffers.push(buffer));
stream.once('error', e);
stream.once('end', () => c(Buffer.concat(buffers)));
});
}
export async function readZip(packagePath: string, filter: (name: string) => boolean): Promise<Map<string, Buffer>> {
const zipfile = await new Promise<ZipFile>((c, e) =>
open(packagePath, { lazyEntries: true }, (err, zipfile) => (err ? e(err) : c(zipfile!)))
);
return await new Promise((c, e) => {
const result = new Map<string, Buffer>();
zipfile.once('close', () => c(result));
zipfile.readEntry();
zipfile.on('entry', (entry: Entry) => {
const name = entry.fileName.toLowerCase();
if (filter(name)) {
zipfile.openReadStream(entry, (err, stream) => {
if (err) {
zipfile.close();
return e(err);
}
bufferStream(stream!).then(buffer => {
result.set(name, buffer);
zipfile.readEntry();
});
});
} else {
zipfile.readEntry();
}
});
});
}

What it could look like when reading zip files with this library:

import { openAsBlob } from 'node:fs';
import zipReader from 'zip-go/lib/read.js';

export async function readZip(packagePath: string, filter: (name: string) => boolean): Promise<Map<string, Buffer>> {
	const zipFile = await openAsBlob(packagePath);
	const result = new Map<string, Buffer>();

	for await (const entry of zipReader(zipFile)) {
		const name = entry.name.toLowerCase();
		if (filter(name)) {
			const bytes = await entry.arrayBuffer();
			result.set(name, Buffer.from(bytes));
		}
	}

	return result;
}

This approach introduces minimal breaking changes to the codebase. However, I'd personally recommend this alternative:

const result = new Map<string, FileLike>();

for await (const entry of zipReader(zipFile)) {
	const name = entry.name.toLowerCase();
	if (filter(name)) {
		result.set(name, entry);
	}
}

// Later, you can use:
const entry = result.get(path)

// For getting a true native File object (and not some file like object)
await entry.file()

// All of these methods exist on Response, Request, File, and Blob, making it very flexible on an entry as well:
await entry.text()
await entry.bytes()
await entry.arrayBuffer()
entry.stream().pipeTo(...)

I'd also suggest this improvement:

- function writeVsix(files: IFile[], packagePath: string): Promise<void> {
+ function writeVsix(files: File[], packagePath: string): Promise<void> {

Instead of creating "memory files" with { content: "..." }, you'd create actual File objects using new File([content], path, { ... }), or use openAsBlob directly for files on disk before calling writeVsix.


Interested in benchmarks?
Check out the performance comparisons here:
https://github.com/jimmywarting/zip-benchmark.js

And here's a browser benchmark as well:
https://jimmywarting.github.io/zip-benchmark.js/browser-benchmark.html

(Note: yauzl shows significant performance penalties when running in non-Node.js environments)

Another reason for why i think my zib lib is better is b/c it dose not have any network, disc or other IO permission, making it very sandboxing safe, you give the lib the data you need and it gives you stuff back. I do not think libs should have any IO access at all - only application should have that privilege

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions