Skip to content

squareetlabs/LaravelPDF2HTML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Laravel PDF to HTML Converter

A robust and dependency-free Laravel package to convert PDF files to HTML using poppler-utils (pdftohtml).

Features

  • Dependency Free: Does not rely on external PHP packages.
  • Laravel Integration: Automatic discovery, config publishing, and easy-to-use API.
  • Binary Auto-Discovery: Automatically finds pdftohtml and pdfinfo binaries on your system.
  • Customizable: Extensive options for zooming, image handling, and output formatting.
  • Inline Assets: Automatically inlines CSS and Images (Base64) for a self-contained HTML output.
  • Strict Types: Written with modern PHP standards and strict typing.

Requirements

  • PHP > 8.1
  • poppler-utils installed on your server (contains pdftohtml and pdfinfo).

Installation

  1. Install via Composer:

    composer require squareetlabs/laravel-pdf-to-html
  2. Install poppler-utils:

    • Ubuntu/Debian:
      sudo apt-get install poppler-utils
    • MacOS:
      brew install poppler
    • CentOS/RHEL:
      sudo yum install poppler-utils
  3. Publish Configuration (Optional):

    php artisan vendor:publish --provider="Squareetlabs\LaravelPdfToHtml\Providers\PdfToHtmlServiceProvider"

Usage

Basic Usage

use Squareetlabs\LaravelPdfToHtml\Pdf;

try {
    // Create a new instance
    $pdf = new Pdf('/path/to/document.pdf');
    
    // Get HTML content
    $html = $pdf->getHtml();
    
    // Get all pages content as an array
    $pages = $html->getAllPages();
    
    // Get specific page
    $page1 = $html->getPage(1);
    
    echo $page1;
    
} catch (\Exception $e) {
    echo "Error: " . $e->getMessage();
}

Advanced Options

You can pass options to the constructor to customize the behavior:

$options = [
    'pdftohtml_path' => '/usr/custom/bin/pdftohtml', // Optional custom path
    'pdfinfo_path' => '/usr/custom/bin/pdfinfo',     // Optional custom path
    'generate' => [
        'singlePage' => false,      // Split pages (default)
        'imageJpeg' => true,        // Convert images to JPEG
        'ignoreImages' => false,    // Keep images
        'zoom' => 1.5,              // Zoom factor
        'noFrames' => true,         // Output without frames
    ],
    'html' => [
        'inlineCss' => true,        // Inline CSS into style attributes
        'inlineImages' => true,     // Convert images to Base64
        'onlyContent' => true,      // Return only body content
    ],
    'clearAfter' => true,           // Clear temp files after processing
];

$pdf = new Pdf('/path/to/document.pdf', $options);

Get PDF Info

$info = $pdf->getInfo();
// Returns array: ['pages' => 10, 'size' => '...', ...]

$count = $pdf->countPages();

Testing

composer test

License

MIT

About

A robust and dependency-free Laravel package to convert PDF files to HTML using `poppler-utils` (pdftohtml).

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages