A static, single-file website that converts scanned PDFs to Markdown or DOCX using GLM-OCR — a 0.9B parameter vision-language model that ranks #1 on OmniDocBench V1.5 — running locally via Ollama.
No cloud services. No API keys. Everything runs on your machine.
┌──────────────────┐ ┌──────────────────┐
│ Browser │ REST API │ Ollama Server │
│ │ (streaming) │ │
│ PDF.js renders │ ──────────────► │ GLM-OCR model │
│ pages to JPEG │ base64 image │ (0.9B params) │
│ │ │ │
│ marked.js shows │ ◄────────────── │ Runs on CPU │
│ live preview │ Markdown text │ inside Docker/ │
│ │ │ Podman │
│ docx.js builds │ │ │
│ Word exports │ │ localhost:11434 │
└──────────────────┘ └──────────────────┘
- Drop a scanned PDF → each page is rendered to a JPEG, sent to GLM-OCR, and the recognized text streams back live
- Markdown output with headings, tables, and formatting preserved
- Export to
.mdor.docxwith one click - No build step — the entire UI is a single
index.htmlfile with ESM imports from CDNs
You need two things installed:
| Requirement | Why | Install |
|---|---|---|
| Docker or Podman | Runs the Ollama server + GLM-OCR model | Docker Desktop or Podman |
| A local HTTP server | Browsers block file:// → localhost requests (CORS) |
Python 3, Node.js, PHP, or any static server |
The setup-glm-ocr.sh script automates everything else: pulling the model (~2-4 GB), detecting your CPU cores, and creating the optimized configuration.
git clone <repo-url>
cd browser-pdf-ocrThe script handles Docker/Podman detection, Podman VM setup (macOS/Windows), model download, and hardware-optimized configuration — all automatically.
macOS includes bash and python3 out of the box. If you use Podman, the script will initialize and start the Linux VM for you.
# Make the script executable (first time only)
chmod +x setup-glm-ocr.sh
# Run it
./setup-glm-ocr.shPodman users: The script auto-detects if the Podman machine is stopped and starts it. It also checks the VM memory and increases it to 8 GB if needed (GLM-OCR requires ~4.2 GB).
Works identically to macOS. Docker typically runs natively (no VM), and Podman runs rootless without a machine.
chmod +x setup-glm-ocr.sh
./setup-glm-ocr.shPermissions note: If Docker requires
sudo, either run the script withsudoor add your user to thedockergroup:sudo usermod -aG docker $USER # Log out and back in, then re-run the script
The script is a Bash script. You have two options:
Option A — Git Bash (recommended)
Git Bash ships with Git for Windows and can run .sh scripts directly. Make sure Docker Desktop or Podman is installed and running.
# In Git Bash
./setup-glm-ocr.shNote:
chmod +xis not required in Git Bash — scripts are executable by default.
Option B — WSL (Windows Subsystem for Linux)
If you use WSL, the script runs exactly like on native Linux. Make sure Docker Desktop has the WSL 2 integration enabled (Settings → Resources → WSL Integration).
chmod +x setup-glm-ocr.sh
./setup-glm-ocr.shOption C — PowerShell (manual setup)
If you prefer not to use Bash, run the equivalent commands manually in PowerShell:
# Pull and start Ollama
docker run -d --name ollama-server `
-v ollama_storage:/root/.ollama `
-p 11434:11434 `
-e OLLAMA_ORIGINS="*" `
ollama/ollama
# Pull the model
docker exec -it ollama-server ollama pull glm-ocr
# Detect your physical core count
(Get-CimInstance Win32_Processor | Measure-Object -Property NumberOfCores -Sum).Sum
# Create the optimized model (replace 10 with your core count)
docker exec -i ollama-server bash -c @"
cat > /tmp/GLM-Config << 'MODELFILE'
FROM glm-ocr
PARAMETER num_ctx 16384
PARAMETER num_thread 10
PARAMETER num_predict 8192
PARAMETER temperature 0
PARAMETER top_p 0.00001
PARAMETER top_k 1
PARAMETER repeat_penalty 1.1
MODELFILE
ollama create glm-ocr-optimized -f /tmp/GLM-Config
"@After the setup script completes, it will suggest the right command based on what's installed on your system. The most common options:
# Python (macOS / Linux — usually pre-installed)
python3 -m http.server 8080
# Node.js
npx http-server -p 8080 -c-1
# PHP
php -S localhost:8080Then open http://localhost:8080 in your browser.
Why not just open the HTML file directly? Browsers block cross-origin requests from
file://URLs. The page needs to call Ollama athttp://localhost:11434, which requires both ends to be served over HTTP.
What about GitHub Pages or other hosted deployments? A live demo is available on GitHub Pages, but browsers treat requests from a public origin (
https://...) tolocalhostas Private Network Access and will block them. The page must be served fromlocalhostto communicate with Ollama.
- Drop a scanned PDF onto the upload area
- Verify the Ollama status dot is green (●)
- Click ▶ Run OCR
- Watch pages process with live-streamed results
- Export as
.mdor.docx
setup-glm-ocr.sh runs through five steps:
| Step | What it does |
|---|---|
| 0. Runtime | Detects Docker; falls back to Podman if Docker is unavailable |
| 0b. Podman VM | (macOS/Windows only) Initializes, starts, and resizes the Podman machine to 8 GB RAM |
| 1. CPU cores | Detects physical cores — sysctl (macOS), lscpu (Linux), Get-CimInstance (Windows) |
| 2. Container | Creates or starts the ollama-server container with OLLAMA_ORIGINS="*" for CORS |
| 3. Model pull | Downloads glm-ocr weights (~2-4 GB) |
| 4. Optimize | Creates glm-ocr-optimized with num_thread set to your physical core count |
| 5. Verify | Lists available models and suggests the right HTTP server command |
The web UI has configurable settings in the left panel:
| Setting | Default | Description |
|---|---|---|
| Ollama endpoint | http://localhost:11434 |
Where Ollama is running |
| Model name | glm-ocr-optimized |
The model tag to use |
| Max image edge | 1024 px |
Longest edge before sending to OCR. Lower = faster, higher = more detail |
| Prompt | Text recognition: |
The instruction sent to GLM-OCR |
Available prompts:
- Text recognition — plain text output in reading order (from the original article)
- OCR → Markdown — explicitly requests Markdown formatting with headings and tables
- Full extraction — broadest instruction, targeting text, tables, and formulas
Red status dot / "Cannot reach Ollama"
- Are you opening via
file://? Use a local HTTP server (see step 3) - Is Ollama running? Check with
curl http://localhost:11434 - Is CORS enabled? The container must be started with
-e OLLAMA_ORIGINS="*" - Is the port mapped? Docker/Podman needs
-p 11434:11434
500 error: "model requires more system memory"
- The Podman VM defaults to ~2 GB RAM but GLM-OCR needs ~4.2 GB
- Fix:
podman machine stop && podman machine set --memory 8192 && podman machine start - The updated setup script handles this automatically
Slow processing (2-3+ minutes per page)
- Expected on CPU — the vision encoder prefill is the bottleneck, not the text generation
- Reduce "Max image edge" to 768 for faster but lower-quality results
- A GPU-enabled Ollama setup (vLLM/SGLang) is dramatically faster
Model not found
- Run
docker exec -it ollama-server ollama listto see available models - Make sure
glm-ocr-optimizedexists, or change the model name in the UI settings
Podman machine won't start
- Try resetting:
podman machine rm && podman machine init --memory 8192 && podman machine start - On macOS, check that virtualization is enabled (it is by default on Apple Silicon)
- PDF.js v5.4 — PDF page rendering to canvas (cdnjs CDN)
- docx v9.6 — DOCX file generation (unpkg CDN)
- marked v16.3 — Markdown to HTML preview (cdnjs CDN)
- Ollama — local LLM serving via
/api/generatewith streaming - GLM-OCR — 0.9B vision-language model, #1 on OmniDocBench V1.5
- No build step, no bundler, no framework — one HTML file + one shell script
Built on top of "Run the World's Best OCR on Your Own", a hands-on guide to deploying GLM-OCR locally published on The Neural Maze. The original article covers Docker-based Ollama deployment, hardware optimization, model configuration, and the GLM-OCR SDK with layout detection. This project takes the Ollama-based approach described in their guide and wraps it in a static web interface for drag-and-drop PDF processing.
Authors:
MIT