Skip to content

costinEEST/browser-pdf-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GLM-OCR Web

A static, single-file website that converts scanned PDFs to Markdown or DOCX using GLM-OCR — a 0.9B parameter vision-language model that ranks #1 on OmniDocBench V1.5 — running locally via Ollama.

No cloud services. No API keys. Everything runs on your machine.


Architecture

┌──────────────────┐                 ┌──────────────────┐
│     Browser      │    REST API     │  Ollama Server   │
│                  │   (streaming)   │                  │
│  PDF.js renders  │ ──────────────► │  GLM-OCR model   │
│  pages to JPEG   │  base64 image   │  (0.9B params)   │
│                  │                 │                  │
│  marked.js shows │ ◄────────────── │  Runs on CPU     │
│  live preview    │  Markdown text   │  inside Docker/  │
│                  │                 │  Podman          │
│  docx.js builds  │                 │                  │
│  Word exports    │                 │  localhost:11434  │
└──────────────────┘                 └──────────────────┘

What You Get

  • Drop a scanned PDF → each page is rendered to a JPEG, sent to GLM-OCR, and the recognized text streams back live
  • Markdown output with headings, tables, and formatting preserved
  • Export to .md or .docx with one click
  • No build step — the entire UI is a single index.html file with ESM imports from CDNs

Prerequisites

You need two things installed:

Requirement Why Install
Docker or Podman Runs the Ollama server + GLM-OCR model Docker Desktop or Podman
A local HTTP server Browsers block file://localhost requests (CORS) Python 3, Node.js, PHP, or any static server

The setup-glm-ocr.sh script automates everything else: pulling the model (~2-4 GB), detecting your CPU cores, and creating the optimized configuration.


Quick Start

1. Clone or download this project

git clone <repo-url>
cd browser-pdf-ocr

2. Run the setup script

The script handles Docker/Podman detection, Podman VM setup (macOS/Windows), model download, and hardware-optimized configuration — all automatically.


macOS

macOS includes bash and python3 out of the box. If you use Podman, the script will initialize and start the Linux VM for you.

# Make the script executable (first time only)
chmod +x setup-glm-ocr.sh

# Run it
./setup-glm-ocr.sh

Podman users: The script auto-detects if the Podman machine is stopped and starts it. It also checks the VM memory and increases it to 8 GB if needed (GLM-OCR requires ~4.2 GB).


Linux

Works identically to macOS. Docker typically runs natively (no VM), and Podman runs rootless without a machine.

chmod +x setup-glm-ocr.sh
./setup-glm-ocr.sh

Permissions note: If Docker requires sudo, either run the script with sudo or add your user to the docker group:

sudo usermod -aG docker $USER
# Log out and back in, then re-run the script

Windows

The script is a Bash script. You have two options:

Option A — Git Bash (recommended)

Git Bash ships with Git for Windows and can run .sh scripts directly. Make sure Docker Desktop or Podman is installed and running.

# In Git Bash
./setup-glm-ocr.sh

Note: chmod +x is not required in Git Bash — scripts are executable by default.

Option B — WSL (Windows Subsystem for Linux)

If you use WSL, the script runs exactly like on native Linux. Make sure Docker Desktop has the WSL 2 integration enabled (Settings → Resources → WSL Integration).

chmod +x setup-glm-ocr.sh
./setup-glm-ocr.sh

Option C — PowerShell (manual setup)

If you prefer not to use Bash, run the equivalent commands manually in PowerShell:

# Pull and start Ollama
docker run -d --name ollama-server `
  -v ollama_storage:/root/.ollama `
  -p 11434:11434 `
  -e OLLAMA_ORIGINS="*" `
  ollama/ollama

# Pull the model
docker exec -it ollama-server ollama pull glm-ocr

# Detect your physical core count
(Get-CimInstance Win32_Processor | Measure-Object -Property NumberOfCores -Sum).Sum

# Create the optimized model (replace 10 with your core count)
docker exec -i ollama-server bash -c @"
cat > /tmp/GLM-Config << 'MODELFILE'
FROM glm-ocr
PARAMETER num_ctx 16384
PARAMETER num_thread 10
PARAMETER num_predict 8192
PARAMETER temperature 0
PARAMETER top_p 0.00001
PARAMETER top_k 1
PARAMETER repeat_penalty 1.1
MODELFILE
ollama create glm-ocr-optimized -f /tmp/GLM-Config
"@

3. Serve the website

After the setup script completes, it will suggest the right command based on what's installed on your system. The most common options:

# Python (macOS / Linux — usually pre-installed)
python3 -m http.server 8080

# Node.js
npx http-server -p 8080 -c-1

# PHP
php -S localhost:8080

Then open http://localhost:8080 in your browser.

Why not just open the HTML file directly? Browsers block cross-origin requests from file:// URLs. The page needs to call Ollama at http://localhost:11434, which requires both ends to be served over HTTP.

What about GitHub Pages or other hosted deployments? A live demo is available on GitHub Pages, but browsers treat requests from a public origin (https://...) to localhost as Private Network Access and will block them. The page must be served from localhost to communicate with Ollama.

4. Use it

  1. Drop a scanned PDF onto the upload area
  2. Verify the Ollama status dot is green (●)
  3. Click ▶ Run OCR
  4. Watch pages process with live-streamed results
  5. Export as .md or .docx

What the Script Does

setup-glm-ocr.sh runs through five steps:

Step What it does
0. Runtime Detects Docker; falls back to Podman if Docker is unavailable
0b. Podman VM (macOS/Windows only) Initializes, starts, and resizes the Podman machine to 8 GB RAM
1. CPU cores Detects physical cores — sysctl (macOS), lscpu (Linux), Get-CimInstance (Windows)
2. Container Creates or starts the ollama-server container with OLLAMA_ORIGINS="*" for CORS
3. Model pull Downloads glm-ocr weights (~2-4 GB)
4. Optimize Creates glm-ocr-optimized with num_thread set to your physical core count
5. Verify Lists available models and suggests the right HTTP server command

Settings

The web UI has configurable settings in the left panel:

Setting Default Description
Ollama endpoint http://localhost:11434 Where Ollama is running
Model name glm-ocr-optimized The model tag to use
Max image edge 1024 px Longest edge before sending to OCR. Lower = faster, higher = more detail
Prompt Text recognition: The instruction sent to GLM-OCR

Available prompts:

  • Text recognition — plain text output in reading order (from the original article)
  • OCR → Markdown — explicitly requests Markdown formatting with headings and tables
  • Full extraction — broadest instruction, targeting text, tables, and formulas

Troubleshooting

Red status dot / "Cannot reach Ollama"

  • Are you opening via file://? Use a local HTTP server (see step 3)
  • Is Ollama running? Check with curl http://localhost:11434
  • Is CORS enabled? The container must be started with -e OLLAMA_ORIGINS="*"
  • Is the port mapped? Docker/Podman needs -p 11434:11434

500 error: "model requires more system memory"

  • The Podman VM defaults to ~2 GB RAM but GLM-OCR needs ~4.2 GB
  • Fix: podman machine stop && podman machine set --memory 8192 && podman machine start
  • The updated setup script handles this automatically

Slow processing (2-3+ minutes per page)

  • Expected on CPU — the vision encoder prefill is the bottleneck, not the text generation
  • Reduce "Max image edge" to 768 for faster but lower-quality results
  • A GPU-enabled Ollama setup (vLLM/SGLang) is dramatically faster

Model not found

  • Run docker exec -it ollama-server ollama list to see available models
  • Make sure glm-ocr-optimized exists, or change the model name in the UI settings

Podman machine won't start

  • Try resetting: podman machine rm && podman machine init --memory 8192 && podman machine start
  • On macOS, check that virtualization is enabled (it is by default on Apple Silicon)

Tech Stack

  • PDF.js v5.4 — PDF page rendering to canvas (cdnjs CDN)
  • docx v9.6 — DOCX file generation (unpkg CDN)
  • marked v16.3 — Markdown to HTML preview (cdnjs CDN)
  • Ollama — local LLM serving via /api/generate with streaming
  • GLM-OCR — 0.9B vision-language model, #1 on OmniDocBench V1.5
  • No build step, no bundler, no framework — one HTML file + one shell script

Credits

Built on top of "Run the World's Best OCR on Your Own", a hands-on guide to deploying GLM-OCR locally published on The Neural Maze. The original article covers Docker-based Ollama deployment, hardware optimization, model configuration, and the GLM-OCR SDK with layout detection. This project takes the Ollama-based approach described in their guide and wraps it in a static web interface for drag-and-drop PDF processing.

Authors:


License

MIT

About

GLM-OCR Web

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors