Skip to content

add pdf converter#42

Open
andrepetsch wants to merge 2 commits intomainfrom
PDF-to-EPUB
Open

add pdf converter#42
andrepetsch wants to merge 2 commits intomainfrom
PDF-to-EPUB

Conversation

@andrepetsch
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings December 19, 2025 17:47
@vercel
Copy link
Copy Markdown

vercel Bot commented Dec 19, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
secret-library Ready Ready Preview, Comment Dec 19, 2025 5:53pm

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds PDF to EPUB conversion functionality to the media library application. The implementation includes a conversion library and an API endpoint that allows users to convert existing PDF files to EPUB format.

Key Changes

  • New pdf-to-epub.ts library that converts PDF buffers to EPUB format using pdf-lib for PDF parsing and @lesjoursfr/html-to-epub for EPUB generation
  • New /api/media/convert-pdf-to-epub API endpoint supporting both single media item conversion and batch conversion of all PDFs
  • Integration with existing media file management system to store generated EPUBs alongside original PDFs

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 18 comments.

File Description
src/lib/pdf-to-epub.ts Core conversion library that extracts PDF metadata and structure, generates HTML content, and creates EPUB files. Currently creates placeholder content rather than extracting actual PDF text.
src/app/api/media/convert-pdf-to-epub/route.ts API endpoint that handles conversion requests, fetches PDF files from blob storage, invokes the conversion library, and stores resulting EPUBs back to the media database.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +23 to +24
const body = await req.json()
const { mediaId, all } = body
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API lacks input validation for the request body structure. When 'all' is provided, it should be validated as a boolean type. Currently, any truthy value for 'all' (including strings like 'false' or 'no') would trigger batch conversion. Add explicit type validation to ensure 'all' is a boolean value when present.

Copilot uses AI. Check for mistakes.
Comment thread src/lib/pdf-to-epub.ts
Comment on lines +103 to +104
// eslint-disable-next-line @typescript-eslint/no-explicit-any
async function extractPdfContent(pdfDoc: any, pageCount: number): Promise<string> {
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'any' type for pdfDoc parameter loses type safety and makes the code harder to maintain. While the comment acknowledges that pdf-lib doesn't have built-in text extraction, the type could be more specific. Consider importing and using the actual PDFDocument type from pdf-lib for better type safety and IDE autocomplete support.

Copilot uses AI. Check for mistakes.
Comment thread src/lib/pdf-to-epub.ts
Comment on lines +136 to +141
for (let i = 1; i <= pageCount; i++) {
htmlContent += `<div class="page">\n`
htmlContent += `<h2>Page ${i}</h2>\n`
htmlContent += `<p><em>[Content from page ${i} of the original PDF]</em></p>\n`
htmlContent += `</div>\n\n`
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conversion process creates placeholder content instead of extracting actual text from the PDF. This means users will get an EPUB with only page markers and no actual content. While the comments acknowledge this limitation, the API provides no indication to callers that the conversion is incomplete. Consider either implementing actual text extraction (e.g., using pdfjs-dist), or returning a warning/status field in the response indicating the conversion is limited to structure only.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +73
if (mediaId) {
// Convert a specific media item
console.log('[Convert PDF to EPUB] Converting single media:', mediaId)
const result = await convertMediaPdfToEpub(mediaId)
results.push(result)
} else if (all) {
// Convert all PDFs that don't have EPUBs yet
console.log('[Convert PDF to EPUB] Converting all PDFs without EPUBs')

// Find all media items that have PDF files but no EPUB files
const mediaWithPdfs = await prisma.media.findMany({
where: {
deletedAt: null,
files: {
some: {
fileType: 'pdf'
}
}
},
include: {
files: true
}
})

console.log(`[Convert PDF to EPUB] Found ${mediaWithPdfs.length} media items with PDFs`)

// Filter to only those without EPUBs
const mediaNeedingConversion = mediaWithPdfs.filter(media =>
!media.files.some(file => file.fileType === 'epub')
)

console.log(`[Convert PDF to EPUB] ${mediaNeedingConversion.length} need EPUB conversion`)

// Convert each one
for (const media of mediaNeedingConversion) {
const result = await convertMediaPdfToEpub(media.id)
results.push(result)
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint lacks authorization checks to verify that the user has permission to convert the specified media items. Unlike other media endpoints in this codebase (e.g., /api/media/[id]/route.ts), which check that existingMedia.uploadedBy === session.user.id, this endpoint allows any authenticated user to convert any media item by ID or all PDFs in the system. This could allow users to trigger expensive conversion operations on media they don't own, and potentially enumerate or access other users' media. Add ownership verification before processing conversions.

Copilot uses AI. Check for mistakes.
Comment on lines +41 to +73
} else if (all) {
// Convert all PDFs that don't have EPUBs yet
console.log('[Convert PDF to EPUB] Converting all PDFs without EPUBs')

// Find all media items that have PDF files but no EPUB files
const mediaWithPdfs = await prisma.media.findMany({
where: {
deletedAt: null,
files: {
some: {
fileType: 'pdf'
}
}
},
include: {
files: true
}
})

console.log(`[Convert PDF to EPUB] Found ${mediaWithPdfs.length} media items with PDFs`)

// Filter to only those without EPUBs
const mediaNeedingConversion = mediaWithPdfs.filter(media =>
!media.files.some(file => file.fileType === 'epub')
)

console.log(`[Convert PDF to EPUB] ${mediaNeedingConversion.length} need EPUB conversion`)

// Convert each one
for (const media of mediaNeedingConversion) {
const result = await convertMediaPdfToEpub(media.id)
results.push(result)
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'all' parameter allows any authenticated user to trigger batch conversion of all PDFs in the system without rate limiting or pagination. This could cause performance issues, excessive resource consumption, and high costs in serverless environments. The operation processes conversions sequentially in a loop (line 70-73) which could timeout for large datasets. Consider adding: (1) pagination or batch size limits, (2) rate limiting, (3) restricting to user's own media only, or (4) implementing this as a background job rather than a synchronous API call.

Copilot uses AI. Check for mistakes.
Comment thread src/lib/pdf-to-epub.ts
@@ -0,0 +1,160 @@
import { EPub } from '@lesjoursfr/html-to-epub'
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package '@lesjoursfr/html-to-epub' is imported but does not appear to be listed in package.json dependencies. This will cause a runtime error when the module is loaded. Please add this package to the dependencies in package.json.

Copilot uses AI. Check for mistakes.
Comment thread src/lib/pdf-to-epub.ts
Comment on lines +87 to +90
// Clean up temp file on error
await fs.unlink(tempFilePath).catch(() => {
// Ignore cleanup errors
})
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function creates a temporary file and attempts to clean it up in both the success path (line 81) and error path (line 88). However, if the file creation never happens (e.g., epub.render() fails before writing), attempting to unlink a non-existent file could cause issues. While the catch blocks ignore errors, this could mask legitimate cleanup issues. Consider checking if the file exists before attempting to delete it, or tracking whether the file was successfully created.

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +140
// Check if EPUB already exists
const epubFile = media.files.find(f => f.fileType === 'epub')
if (epubFile) {
return {
mediaId,
success: false,
error: 'EPUB file already exists for this media'
}
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking if an EPUB already exists (line 133-139), the function returns an error. However, this might not be the desired behavior if a user wants to re-convert a PDF (e.g., after the conversion logic is improved to include actual text extraction). Consider making this check optional through a 'force' or 'overwrite' parameter, or returning success with a message indicating the EPUB already exists rather than treating it as an error condition.

Copilot uses AI. Check for mistakes.
Comment thread src/lib/pdf-to-epub.ts
Comment on lines +39 to +78
console.log(`[PDF to EPUB] Converting PDF with ${pageCount} pages`)

// Extract text from PDF pages
// Note: pdf-lib doesn't have built-in text extraction
// We'll create a simple HTML representation based on page structure
const htmlContent = await extractPdfContent(pdfDoc, pageCount)

// Prepare EPUB metadata
const title = options.title || 'Untitled'
const author = options.author || 'Unknown Author'
const description = options.description || ''
const language = options.language || 'en'

console.log(`[PDF to EPUB] Creating EPUB with title: ${title}`)

// Create EPUB options
const epubOptions = {
title,
author,
description,
lang: language,
content: [
{
title: 'Content',
data: htmlContent,
},
],
verbose: false,
}

// Generate EPUB (pass output as second argument)
const epub = new EPub(epubOptions, tempFilePath)
await epub.render()

console.log('[PDF to EPUB] EPUB file created at:', tempFilePath)

// Read the EPUB file into a buffer
const epubBuffer = await fs.readFile(tempFilePath)

console.log('[PDF to EPUB] EPUB buffer created, size:', epubBuffer.length)
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Console.log statements are used throughout for logging (lines 39, 52, 73, 78). In a production application, consider using a structured logging library that supports different log levels (debug, info, warn, error) and can be configured per environment. This would make it easier to control verbosity and integrate with monitoring tools.

Copilot uses AI. Check for mistakes.
Comment thread src/lib/pdf-to-epub.ts
options: ConversionOptions = {}
): Promise<Buffer> {
// Create a temporary file path for the EPUB
const tempDir = '/tmp'
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded '/tmp' directory path may not exist or be writable in all deployment environments, particularly in serverless platforms like Vercel where the filesystem is read-only except for the '/tmp' directory which has size and persistence limitations. Consider using the system's temporary directory (os.tmpdir()) for better cross-platform compatibility, or ensure this code is aware of the deployment environment's constraints.

Copilot uses AI. Check for mistakes.
@andrepetsch
Copy link
Copy Markdown
Owner Author

@copilot How gets this mechanism triggered?

Copy link
Copy Markdown
Contributor

Copilot AI commented Dec 19, 2025

@andrepetsch I've opened a new pull request, #43, to work on those changes. Once the pull request is ready, I'll request review from you.

@andrepetsch andrepetsch linked an issue Dec 28, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create EPUBs based on PDFs

3 participants