Skip to content
/ afpp Public

Because we all needed just one more way to deal with PDFs. Fast, efficient, minimal. Zero bloat, one dependency. Because we all needed another f*cking pdf parser.

License

Notifications You must be signed in to change notification settings

l2ysho/afpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

afpp

Version GitHub Actions Workflow Status codecov Node npm Downloads Repo Size Last Commit License

Another f*cking PDF parser. Because parsing PDFs in Node.js should be easy. Live long and parse PDFs. ๐Ÿ––

Why?

There are plenty of PDF-related packages for Node.js. They workโ€ฆ until they donโ€™t.

Afpp was built to solve the headaches I ran into while trying to parse PDFs in Node.js:

  • ๐Ÿ“ฆ Do I need a package with 30+ MB just to read a PDF?
  • ๐Ÿงต Why is the event loop blocked?
  • ๐Ÿ Is that a memory leak I smell?
  • ๐ŸŒ Should reading a PDF really be this performance-heavy?
  • ๐Ÿž Why is everything so buggy?
  • ๐ŸŽจ Why does it complain about the lack of a canvas in Node.js?
  • ๐Ÿงฑ Why does canvas require native C++/Python dependencies to build?
  • ๐ŸชŸ Why does it complain about the missing window object?
  • ๐Ÿช„ Why do I need ImageMagick for this?!
  • ๐Ÿ‘ป What the hell is Ghostscript, and why does it keep failing?
  • โŒ Whereโ€™s the TypeScript support?
  • ๐Ÿง“ Why are the dependencies older than my dev career?
  • ๐Ÿ” Why does everything workโ€ฆ until I try an encrypted PDF?
  • ๐Ÿ•ฏ๏ธ Why does every OS need its own special setup ritual?

Prerequisites

  • Node.js >= v22.14.0

๐Ÿ“ฆ Installation

You can install afpp via npm, Yarn, or pnpm.

npm

npm install afpp

Yarn

yarn add afpp

pnpm

pnpm add afpp

Getting started

The afpp library makes it simple to extract text or images from PDF files in Node.js. Whether your PDF is stored locally, hosted online, or encrypted, afpp provides an easy-to-use API to handle it all. All functions have common parameters and accepts string path, buffer, or URL object.

Get text from path

import { readFile } from 'fs/promises';
import path from 'path';

import { pdf2string } from 'afpp';

(async function main() {
  const pathToFile = path.join('..', 'test', 'example.pdf');
  const input = await readFile(pathToFile);
  const data = await pdf2string(input);

  console.log('Extracted text:', data); // ['page 1 content', 'page 2 content', ...]
})();

Get image from URL

import { pdf2image } from 'afpp';

(async function main() {
  const url = new URL('https://pdfobject.com/pdf/sample.pdf');
  const arrayOfImages = await pdf2image(url);

  console.log(arrayOfImages); // [imageBuffer, imageBuffer, ...]
})();

Parse pdf buffer

import { parsePdf } from 'afpp';

(async function main() {
  // Download PDF from URL
  const response = await fetch('https://pdfobject.com/pdf/sample.pdf');
  const buffer = Buffer.from(await response.arrayBuffer());

  // Parse the PDF buffer
  const result = await parsePdf(buffer, {}, (content) => content);
  console.log('Parsed PDF:', result);
})();

Interface: AfppParseOptions

Common properties of all afpp functions. Example usage

const result = await parsePdf(buffer, {
  concurrency: 5,
  imageEncoding: 'jpeg',
  password: 'STRONG_PASS',
  scale: 4,
});

Properties

concurrency?

optional concurrency: number

Concurrency level for page processing. Defaults to 1. Higher values may improve performance but increase memory usage.

Default

1;

imageEncoding?

optional imageEncoding: ImageEncoding

Image encoding format when rendering non-text pages. Defaults to 'png'. Supported formats: 'avif', 'jpeg', 'png', 'webp'.

Default

'png';

password?

optional password: string

Password for encrypted pdf files.


scale?

optional scale: number

Scale of a page if content is not text (or pdf2image is used). Defaults to 2.0. Higher values increase image resolution but also memory usage.

Default

2.0;

License

This project is licensed under the terms of the MIT License.

About

Because we all needed just one more way to deal with PDFs. Fast, efficient, minimal. Zero bloat, one dependency. Because we all needed another f*cking pdf parser.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •