Skip to content

Optimize PDF files to reduce repository size while maintaining readability#1

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/fix-10a81d4e-d9fd-40df-9211-64ba5ad54485
Draft

Optimize PDF files to reduce repository size while maintaining readability#1
Copilot wants to merge 3 commits intomainfrom
copilot/fix-10a81d4e-d9fd-40df-9211-64ba5ad54485

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jul 25, 2025

This PR addresses the repository size issue by optimizing all PDF files to be smaller and more efficient while ensuring they remain parsable and searchable.

Problem

The repository contained 31 PDF files totaling 277.8 MB, making it large for GitHub hosting and slow to clone. Some individual files were very large (up to 38MB), approaching GitHub's file size limits.

Solution

Implemented comprehensive PDF optimization using Ghostscript compression with intelligent settings based on file size:

  • Large files (>10MB): Applied /screen compression for maximum size reduction
  • Medium files (1-10MB): Used /ebook compression for balanced quality/size ratio
  • Small files (<1MB): Applied light optimization or kept original when beneficial

Results

  • 33% total size reduction: 277.8 MB → 186.2 MB (saved 91.6 MB)
  • All 31 PDFs verified as readable and searchable after compression
  • No content loss: Text extraction and readability confirmed for all files

Notable Compression Achievements

  • Archit Neurology section: 25MB → 3MB (88% reduction)
  • ADA Glycemic Goals: 3MB → 676KB (77% reduction)
  • Cranial Nerves: 18MB → 7MB (62% reduction)
  • Harrison's Gastroenterology: 16MB → 8MB (50% reduction)
  • Harrison's General: 23MB → 12MB (48% reduction)

Technical Details

  • Used Ghostscript (gs) and qpdf for optimization
  • Implemented safety checks to revert files that increased in size
  • Maintained backup copies during optimization process
  • Verified text extraction capability on all compressed files

The repository is now more efficient for GitHub hosting while preserving all medical exam preparation content in a fully searchable format.

Warning

Firewall rules blocked me from connecting to one or more addresses

I tried to connect to the following addresses, but was blocked by firewall rules:

  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI and others added 2 commits July 25, 2025 19:05
Co-authored-by: PostgraduateAvi <221811199+PostgraduateAvi@users.noreply.github.com>
Co-authored-by: PostgraduateAvi <221811199+PostgraduateAvi@users.noreply.github.com>
Copilot AI changed the title [WIP] Convert exisisting pdfs inside repo to parsable pdfs to be small enough to fit in space.Convert existing PDFs in the repository to be parsable and ensure they are small enough to fit the available space. Optimize PDF files to reduce repository size while maintaining readability Jul 25, 2025
Copilot AI requested a review from PostgraduateAvi July 25, 2025 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants