This project is a simple Node.js API that converts a remote PDF document into Markdown text.
- Exposes an HTTP API with two endpoints:
GET /convert?url=<PDF_URL>GET /<encoded_PDF_URL>(for URLs encoded in the path, e.g.https://monsuperservice.com/https%3A%2F%2Fexample.com%2Ffile.pdf)
- Extracts text from PDF using pdfjs-dist
- Converts headings, lists, paragraphs to Markdown
- Handles multi-page documents (adds
---as page separator) - CORS enabled (open)
git clone <repo_url>
cd pdf-to-markdown-service
npm installStart the server:
npm startBy default it runs on http://localhost:3000
# Convert via query parameter
curl -L "http://localhost:3000/convert?url=https%3A%2F%2Fwww.w3.org%2FWAI%2FER%2Ftests%2Fxhtml%2Ftestfiles%2Fresources%2Fpdf%2Fdummy.pdf"
# Convert via encoded URL in path
curl -L "http://localhost:3000/https%3A%2F%2Fwww.w3.org%2FWAI%2FER%2Ftests%2Fxhtml%2Ftestfiles%2Fresources%2Fpdf%2Fdummy.pdf"- Node.js >= 20
- Internet access to fetch remote PDFs
- Only extracts text (no OCR for scanned PDFs)
- Complex layouts (tables, multi-column) may be simplified
- Titles detection is heuristic (font size relative)
Made with ❤️ in Node.js