add pdf converter by andrepetsch · Pull Request #42 · andrepetsch/secret-library

andrepetsch · 2025-12-19T17:47:53Z

No description provided.

vercel · 2025-12-19T17:47:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
secret-library	Ready	Preview, Comment	Dec 19, 2025 5:53pm

Copilot

Pull request overview

This PR adds PDF to EPUB conversion functionality to the media library application. The implementation includes a conversion library and an API endpoint that allows users to convert existing PDF files to EPUB format.

Key Changes

New pdf-to-epub.ts library that converts PDF buffers to EPUB format using pdf-lib for PDF parsing and @lesjoursfr/html-to-epub for EPUB generation
New /api/media/convert-pdf-to-epub API endpoint supporting both single media item conversion and batch conversion of all PDFs
Integration with existing media file management system to store generated EPUBs alongside original PDFs

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 18 comments.

File	Description
src/lib/pdf-to-epub.ts	Core conversion library that extracts PDF metadata and structure, generates HTML content, and creates EPUB files. Currently creates placeholder content rather than extracting actual PDF text.
src/app/api/media/convert-pdf-to-epub/route.ts	API endpoint that handles conversion requests, fetches PDF files from blob storage, invokes the conversion library, and stores resulting EPUBs back to the media database.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-19T17:54:04Z

+    const body = await req.json()
+    const { mediaId, all } = body


The API lacks input validation for the request body structure. When 'all' is provided, it should be validated as a boolean type. Currently, any truthy value for 'all' (including strings like 'false' or 'no') would trigger batch conversion. Add explicit type validation to ensure 'all' is a boolean value when present.

Copilot · 2025-12-19T17:54:04Z

+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+async function extractPdfContent(pdfDoc: any, pageCount: number): Promise<string> {


Using 'any' type for pdfDoc parameter loses type safety and makes the code harder to maintain. While the comment acknowledges that pdf-lib doesn't have built-in text extraction, the type could be more specific. Consider importing and using the actual PDFDocument type from pdf-lib for better type safety and IDE autocomplete support.

Copilot · 2025-12-19T17:54:04Z

+  for (let i = 1; i <= pageCount; i++) {
+    htmlContent += `<div class="page">\n`
+    htmlContent += `<h2>Page ${i}</h2>\n`
+    htmlContent += `<p><em>[Content from page ${i} of the original PDF]</em></p>\n`
+    htmlContent += `</div>\n\n`
+  }


The conversion process creates placeholder content instead of extracting actual text from the PDF. This means users will get an EPUB with only page markers and no actual content. While the comments acknowledge this limitation, the API provides no indication to callers that the conversion is incomplete. Consider either implementing actual text extraction (e.g., using pdfjs-dist), or returning a warning/status field in the response indicating the conversion is limited to structure only.

Copilot · 2025-12-19T17:54:04Z

+    if (mediaId) {
+      // Convert a specific media item
+      console.log('[Convert PDF to EPUB] Converting single media:', mediaId)
+      const result = await convertMediaPdfToEpub(mediaId)
+      results.push(result)
+    } else if (all) {
+      // Convert all PDFs that don't have EPUBs yet
+      console.log('[Convert PDF to EPUB] Converting all PDFs without EPUBs')
+
+      // Find all media items that have PDF files but no EPUB files
+      const mediaWithPdfs = await prisma.media.findMany({
+        where: {
+          deletedAt: null,
+          files: {
+            some: {
+              fileType: 'pdf'
+            }
+          }
+        },
+        include: {
+          files: true
+        }
+      })
+
+      console.log(`[Convert PDF to EPUB] Found ${mediaWithPdfs.length} media items with PDFs`)
+
+      // Filter to only those without EPUBs
+      const mediaNeedingConversion = mediaWithPdfs.filter(media => 
+        !media.files.some(file => file.fileType === 'epub')
+      )
+
+      console.log(`[Convert PDF to EPUB] ${mediaNeedingConversion.length} need EPUB conversion`)
+
+      // Convert each one
+      for (const media of mediaNeedingConversion) {
+        const result = await convertMediaPdfToEpub(media.id)
+        results.push(result)
+      }


This endpoint lacks authorization checks to verify that the user has permission to convert the specified media items. Unlike other media endpoints in this codebase (e.g., /api/media/[id]/route.ts), which check that existingMedia.uploadedBy === session.user.id, this endpoint allows any authenticated user to convert any media item by ID or all PDFs in the system. This could allow users to trigger expensive conversion operations on media they don't own, and potentially enumerate or access other users' media. Add ownership verification before processing conversions.

Copilot · 2025-12-19T17:54:05Z

+    } else if (all) {
+      // Convert all PDFs that don't have EPUBs yet
+      console.log('[Convert PDF to EPUB] Converting all PDFs without EPUBs')
+
+      // Find all media items that have PDF files but no EPUB files
+      const mediaWithPdfs = await prisma.media.findMany({
+        where: {
+          deletedAt: null,
+          files: {
+            some: {
+              fileType: 'pdf'
+            }
+          }
+        },
+        include: {
+          files: true
+        }
+      })
+
+      console.log(`[Convert PDF to EPUB] Found ${mediaWithPdfs.length} media items with PDFs`)
+
+      // Filter to only those without EPUBs
+      const mediaNeedingConversion = mediaWithPdfs.filter(media => 
+        !media.files.some(file => file.fileType === 'epub')
+      )
+
+      console.log(`[Convert PDF to EPUB] ${mediaNeedingConversion.length} need EPUB conversion`)
+
+      // Convert each one
+      for (const media of mediaNeedingConversion) {
+        const result = await convertMediaPdfToEpub(media.id)
+        results.push(result)
+      }


The 'all' parameter allows any authenticated user to trigger batch conversion of all PDFs in the system without rate limiting or pagination. This could cause performance issues, excessive resource consumption, and high costs in serverless environments. The operation processes conversions sequentially in a loop (line 70-73) which could timeout for large datasets. Consider adding: (1) pagination or batch size limits, (2) rate limiting, (3) restricting to user's own media only, or (4) implementing this as a background job rather than a synchronous API call.

Copilot · 2025-12-19T17:54:07Z

@@ -0,0 +1,160 @@
+import { EPub } from '@lesjoursfr/html-to-epub'


The package '@lesjoursfr/html-to-epub' is imported but does not appear to be listed in package.json dependencies. This will cause a runtime error when the module is loaded. Please add this package to the dependencies in package.json.

Copilot · 2025-12-19T17:54:07Z

+    // Clean up temp file on error
+    await fs.unlink(tempFilePath).catch(() => {
+      // Ignore cleanup errors
+    })


The function creates a temporary file and attempts to clean it up in both the success path (line 81) and error path (line 88). However, if the file creation never happens (e.g., epub.render() fails before writing), attempting to unlink a non-existent file could cause issues. While the catch blocks ignore errors, this could mask legitimate cleanup issues. Consider checking if the file exists before attempting to delete it, or tracking whether the file was successfully created.

Copilot · 2025-12-19T17:54:08Z

+    // Check if EPUB already exists
+    const epubFile = media.files.find(f => f.fileType === 'epub')
+    if (epubFile) {
+      return {
+        mediaId,
+        success: false,
+        error: 'EPUB file already exists for this media'
+      }
+    }


When checking if an EPUB already exists (line 133-139), the function returns an error. However, this might not be the desired behavior if a user wants to re-convert a PDF (e.g., after the conversion logic is improved to include actual text extraction). Consider making this check optional through a 'force' or 'overwrite' parameter, or returning success with a message indicating the EPUB already exists rather than treating it as an error condition.

Copilot · 2025-12-19T17:54:08Z

+    console.log(`[PDF to EPUB] Converting PDF with ${pageCount} pages`)
+
+    // Extract text from PDF pages
+    // Note: pdf-lib doesn't have built-in text extraction
+    // We'll create a simple HTML representation based on page structure
+    const htmlContent = await extractPdfContent(pdfDoc, pageCount)
+
+    // Prepare EPUB metadata
+    const title = options.title || 'Untitled'
+    const author = options.author || 'Unknown Author'
+    const description = options.description || ''
+    const language = options.language || 'en'
+
+    console.log(`[PDF to EPUB] Creating EPUB with title: ${title}`)
+
+    // Create EPUB options
+    const epubOptions = {
+      title,
+      author,
+      description,
+      lang: language,
+      content: [
+        {
+          title: 'Content',
+          data: htmlContent,
+        },
+      ],
+      verbose: false,
+    }
+
+    // Generate EPUB (pass output as second argument)
+    const epub = new EPub(epubOptions, tempFilePath)
+    await epub.render()
+
+    console.log('[PDF to EPUB] EPUB file created at:', tempFilePath)
+
+    // Read the EPUB file into a buffer
+    const epubBuffer = await fs.readFile(tempFilePath)
+
+    console.log('[PDF to EPUB] EPUB buffer created, size:', epubBuffer.length)


Console.log statements are used throughout for logging (lines 39, 52, 73, 78). In a production application, consider using a structured logging library that supports different log levels (debug, info, warn, error) and can be configured per environment. This would make it easier to control verbosity and integrate with monitoring tools.

Copilot · 2025-12-19T17:54:08Z

+  options: ConversionOptions = {}
+): Promise<Buffer> {
+  // Create a temporary file path for the EPUB
+  const tempDir = '/tmp'


The hardcoded '/tmp' directory path may not exist or be writable in all deployment environments, particularly in serverless platforms like Vercel where the filesystem is read-only except for the '/tmp' directory which has size and persistence limitations. Consider using the system's temporary directory (os.tmpdir()) for better cross-platform compatibility, or ensure this code is aware of the deployment environment's constraints.

andrepetsch · 2025-12-19T18:08:05Z

@copilot How gets this mechanism triggered?

Copilot · 2025-12-19T18:08:13Z

@andrepetsch I've opened a new pull request, #43, to work on those changes. Once the pull request is ready, I'll request review from you.

add pdf converter

91329d2

Copilot AI review requested due to automatic review settings December 19, 2025 17:47

Copilot started reviewing on behalf of andrepetsch December 19, 2025 17:48 View session

added package

cdfb744

vercel Bot deployed to Preview December 19, 2025 17:51 View deployment

vercel Bot deployed to Production December 19, 2025 17:53 View deployment

Copilot AI reviewed Dec 19, 2025

View reviewed changes

Copilot AI mentioned this pull request Dec 19, 2025

Add UI button to convert PDF files to EPUB format #43

Draft

3 tasks

andrepetsch linked an issue Dec 28, 2025 that may be closed by this pull request

Create EPUBs based on PDFs #33

Open

		// eslint-disable-next-line @typescript-eslint/no-explicit-any
		async function extractPdfContent(pdfDoc: any, pageCount: number): Promise<string> {

		@@ -0,0 +1,160 @@
		import { EPub } from '@lesjoursfr/html-to-epub'

Conversation

andrepetsch commented Dec 19, 2025

Uh oh!

vercel Bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

andrepetsch commented Dec 19, 2025

Uh oh!

Copilot AI commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vercel Bot commented Dec 19, 2025 •

edited

Loading