Another f*cking PDF parser. Because parsing PDFs in Node.js should be easy. Live long and parse PDFs. ๐
There are plenty of PDF-related packages for Node.js. They workโฆ until they donโt.
Afpp was built to solve the headaches I ran into while trying to parse PDFs in Node.js:
- ๐ฆ Do I need a package with 30+ MB just to read a PDF?
- ๐งต Why is the event loop blocked?
- ๐ Is that a memory leak I smell?
- ๐ Should reading a PDF really be this performance-heavy?
- ๐ Why is everything so buggy?
- ๐จ Why does it complain about the lack of a canvas in Node.js?
- ๐งฑ Why does canvas require native C++/Python dependencies to build?
- ๐ช Why does it complain about the missing window object?
- ๐ช Why do I need ImageMagick for this?!
- ๐ป What the hell is Ghostscript, and why does it keep failing?
- โ Whereโs the TypeScript support?
- ๐ง Why are the dependencies older than my dev career?
- ๐ Why does everything workโฆ until I try an encrypted PDF?
- ๐ฏ๏ธ Why does every OS need its own special setup ritual?
- Node.js >= v22.14.0
You can install afpp
via npm, Yarn, or pnpm.
npm install afpp
yarn add afpp
pnpm add afpp
The afpp
library makes it simple to extract text or images from PDF files in Node.js. Whether your PDF is stored locally, hosted online, or encrypted, afpp
provides an easy-to-use API to handle it all. All functions have common parameters and accepts string path, buffer, or URL object.
import { readFile } from 'fs/promises';
import path from 'path';
import { pdf2string } from 'afpp';
(async function main() {
const pathToFile = path.join('..', 'test', 'example.pdf');
const input = await readFile(pathToFile);
const data = await pdf2string(input);
console.log('Extracted text:', data); // ['page 1 content', 'page 2 content', ...]
})();
import { pdf2image } from 'afpp';
(async function main() {
const url = new URL('https://pdfobject.com/pdf/sample.pdf');
const arrayOfImages = await pdf2image(url);
console.log(arrayOfImages); // [imageBuffer, imageBuffer, ...]
})();
import { parsePdf } from 'afpp';
(async function main() {
// Download PDF from URL
const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
const buffer = Buffer.from(await response.arrayBuffer());
// Parse the PDF buffer
const result = await parsePdf(buffer, {}, (content) => content);
console.log('Parsed PDF:', result);
})();
Common properties of all afpp functions. Example usage
const result = await parsePdf(buffer, {
concurrency: 5,
imageEncoding: 'jpeg',
password: 'STRONG_PASS',
scale: 4,
});
optional
concurrency:number
Concurrency level for page processing. Defaults to 1. Higher values may improve performance but increase memory usage.
1;
optional
imageEncoding:ImageEncoding
Image encoding format when rendering non-text pages. Defaults to 'png'. Supported formats: 'avif', 'jpeg', 'png', 'webp'.
'png';
optional
password:string
Password for encrypted pdf files.
optional
scale:number
Scale of a page if content is not text (or pdf2image is used). Defaults to 2.0. Higher values increase image resolution but also memory usage.
2.0;
This project is licensed under the terms of the MIT License.