GLM-OCR Web

A static, single-file website that converts scanned PDFs to Markdown or DOCX using GLM-OCR — a 0.9B parameter vision-language model that ranks #1 on OmniDocBench V1.5 — running locally via Ollama.

No cloud services. No API keys. Everything runs on your machine.

Architecture

┌──────────────────┐                 ┌──────────────────┐
│     Browser      │    REST API     │  Ollama Server   │
│                  │   (streaming)   │                  │
│  PDF.js renders  │ ──────────────► │  GLM-OCR model   │
│  pages to JPEG   │  base64 image   │  (0.9B params)   │
│                  │                 │                  │
│  marked.js shows │ ◄────────────── │  Runs on CPU     │
│  live preview    │  Markdown text   │  inside Docker/  │
│                  │                 │  Podman          │
│  docx.js builds  │                 │                  │
│  Word exports    │                 │  localhost:11434  │
└──────────────────┘                 └──────────────────┘

What You Get

Drop a scanned PDF → each page is rendered to a JPEG, sent to GLM-OCR, and the recognized text streams back live
Markdown output with headings, tables, and formatting preserved
Export to .md or .docx with one click
No build step — the entire UI is a single index.html file with ESM imports from CDNs

Prerequisites

You need two things installed:

Requirement	Why	Install
Docker or Podman	Runs the Ollama server + GLM-OCR model	Docker Desktop or Podman
A local HTTP server	Browsers block `file://` → `localhost` requests (CORS)	Python 3, Node.js, PHP, or any static server

The setup-glm-ocr.sh script automates everything else: pulling the model (~2-4 GB), detecting your CPU cores, and creating the optimized configuration.

Quick Start

1. Clone or download this project

git clone <repo-url>
cd browser-pdf-ocr

2. Run the setup script

The script handles Docker/Podman detection, Podman VM setup (macOS/Windows), model download, and hardware-optimized configuration — all automatically.

macOS

macOS includes bash and python3 out of the box. If you use Podman, the script will initialize and start the Linux VM for you.

# Make the script executable (first time only)
chmod +x setup-glm-ocr.sh

# Run it
./setup-glm-ocr.sh

Podman users: The script auto-detects if the Podman machine is stopped and starts it. It also checks the VM memory and increases it to 8 GB if needed (GLM-OCR requires ~4.2 GB).

Linux

Works identically to macOS. Docker typically runs natively (no VM), and Podman runs rootless without a machine.

chmod +x setup-glm-ocr.sh
./setup-glm-ocr.sh

Permissions note: If Docker requires sudo, either run the script with sudo or add your user to the docker group:
sudo usermod -aG docker $USER
# Log out and back in, then re-run the script

Windows

The script is a Bash script. You have two options:

Option A — Git Bash (recommended)

Git Bash ships with Git for Windows and can run .sh scripts directly. Make sure Docker Desktop or Podman is installed and running.

# In Git Bash
./setup-glm-ocr.sh

Note: chmod +x is not required in Git Bash — scripts are executable by default.

Option B — WSL (Windows Subsystem for Linux)

If you use WSL, the script runs exactly like on native Linux. Make sure Docker Desktop has the WSL 2 integration enabled (Settings → Resources → WSL Integration).

chmod +x setup-glm-ocr.sh
./setup-glm-ocr.sh

Option C — PowerShell (manual setup)

If you prefer not to use Bash, run the equivalent commands manually in PowerShell:

# Pull and start Ollama
docker run -d --name ollama-server `
  -v ollama_storage:/root/.ollama `
  -p 11434:11434 `
  -e OLLAMA_ORIGINS="*" `
  ollama/ollama

# Pull the model
docker exec -it ollama-server ollama pull glm-ocr

# Detect your physical core count
(Get-CimInstance Win32_Processor | Measure-Object -Property NumberOfCores -Sum).Sum

# Create the optimized model (replace 10 with your core count)
docker exec -i ollama-server bash -c @"
cat > /tmp/GLM-Config << 'MODELFILE'
FROM glm-ocr
PARAMETER num_ctx 16384
PARAMETER num_thread 10
PARAMETER num_predict 8192
PARAMETER temperature 0
PARAMETER top_p 0.00001
PARAMETER top_k 1
PARAMETER repeat_penalty 1.1
MODELFILE
ollama create glm-ocr-optimized -f /tmp/GLM-Config
"@

3. Serve the website

After the setup script completes, it will suggest the right command based on what's installed on your system. The most common options:

# Python (macOS / Linux — usually pre-installed)
python3 -m http.server 8080

# Node.js
npx http-server -p 8080 -c-1

# PHP
php -S localhost:8080

Then open http://localhost:8080 in your browser.

Why not just open the HTML file directly? Browsers block cross-origin requests from file:// URLs. The page needs to call Ollama at http://localhost:11434, which requires both ends to be served over HTTP.

What about GitHub Pages or other hosted deployments? A live demo is available on GitHub Pages, but browsers treat requests from a public origin (https://...) to localhost as Private Network Access and will block them. The page must be served from localhost to communicate with Ollama.

4. Use it

Drop a scanned PDF onto the upload area
Verify the Ollama status dot is green (●)
Click ▶ Run OCR
Watch pages process with live-streamed results
Export as .md or .docx

What the Script Does

setup-glm-ocr.sh runs through five steps:

Step	What it does
0. Runtime	Detects Docker; falls back to Podman if Docker is unavailable
0b. Podman VM	(macOS/Windows only) Initializes, starts, and resizes the Podman machine to 8 GB RAM
1. CPU cores	Detects physical cores — `sysctl` (macOS), `lscpu` (Linux), `Get-CimInstance` (Windows)
2. Container	Creates or starts the `ollama-server` container with `OLLAMA_ORIGINS="*"` for CORS
3. Model pull	Downloads `glm-ocr` weights (~2-4 GB)
4. Optimize	Creates `glm-ocr-optimized` with `num_thread` set to your physical core count
5. Verify	Lists available models and suggests the right HTTP server command

Settings

The web UI has configurable settings in the left panel:

Setting	Default	Description
Ollama endpoint	`http://localhost:11434`	Where Ollama is running
Model name	`glm-ocr-optimized`	The model tag to use
Max image edge	`1024` px	Longest edge before sending to OCR. Lower = faster, higher = more detail
Prompt	`Text recognition:`	The instruction sent to GLM-OCR

Available prompts:

Text recognition — plain text output in reading order (from the original article)
OCR → Markdown — explicitly requests Markdown formatting with headings and tables
Full extraction — broadest instruction, targeting text, tables, and formulas

Troubleshooting

Red status dot / "Cannot reach Ollama"

Are you opening via file://? Use a local HTTP server (see step 3)
Is Ollama running? Check with curl http://localhost:11434
Is CORS enabled? The container must be started with -e OLLAMA_ORIGINS="*"
Is the port mapped? Docker/Podman needs -p 11434:11434

500 error: "model requires more system memory"

The Podman VM defaults to ~2 GB RAM but GLM-OCR needs ~4.2 GB
Fix: podman machine stop && podman machine set --memory 8192 && podman machine start
The updated setup script handles this automatically

Slow processing (2-3+ minutes per page)

Expected on CPU — the vision encoder prefill is the bottleneck, not the text generation
Reduce "Max image edge" to 768 for faster but lower-quality results
A GPU-enabled Ollama setup (vLLM/SGLang) is dramatically faster

Model not found

Run docker exec -it ollama-server ollama list to see available models
Make sure glm-ocr-optimized exists, or change the model name in the UI settings

Podman machine won't start

Try resetting: podman machine rm && podman machine init --memory 8192 && podman machine start
On macOS, check that virtualization is enabled (it is by default on Apple Silicon)

Tech Stack

PDF.js v5.4 — PDF page rendering to canvas (cdnjs CDN)
docx v9.6 — DOCX file generation (unpkg CDN)
marked v16.3 — Markdown to HTML preview (cdnjs CDN)
Ollama — local LLM serving via /api/generate with streaming
GLM-OCR — 0.9B vision-language model, #1 on OmniDocBench V1.5
No build step, no bundler, no framework — one HTML file + one shell script

Credits

Built on top of "Run the World's Best OCR on Your Own", a hands-on guide to deploying GLM-OCR locally published on The Neural Maze. The original article covers Docker-based Ollama deployment, hardware optimization, model configuration, and the GLM-OCR SDK with layout detection. This project takes the Ollama-based approach described in their guide and wraps it in a static web interface for drag-and-drop PDF processing.

Authors:

Antonio Zarauz Moreno — GitHub · LinkedIn
Miguel Otero Pedrido — GitHub · LinkedIn

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
index.html		index.html
setup-glm-ocr.sh		setup-glm-ocr.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GLM-OCR Web

Architecture

What You Get

Prerequisites

Quick Start

1. Clone or download this project

2. Run the setup script

macOS

Linux

Windows

3. Serve the website

4. Use it

What the Script Does

Settings

Troubleshooting

Tech Stack

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GLM-OCR Web

Architecture

What You Get

Prerequisites

Quick Start

1. Clone or download this project

2. Run the setup script

macOS

Linux

Windows

3. Serve the website

4. Use it

What the Script Does

Settings

Troubleshooting

Tech Stack

Credits

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages