A Streamlit web app for Optical Character Recognition powered by ZAI GLM-OCR. Upload images or PDFs and extract text, formulas, tables, or structured JSON fields — all running locally on your machine, no API key required.
Main interface — upload panel (left) and recognition panel (right)
Sidebar showing hardware notice and live Activity Monitor
Information Extraction mode with JSON schema editor and preset selector
- Four extraction modes — Text, Formula, Table, and Information Extraction (JSON schema)
- PDF support — renders every page at 2× DPI; navigate pages with a visual preview
- Information Extraction presets — Personal ID, Invoice, Receipt, Business Card, or define your own JSON schema
- Multi-page processing — run OCR on the current page, a custom range, or all pages at once
- Live streaming output — results appear line-by-line as the model generates
- Cancellable runs — a Stop button aborts before the next
model.generate()call - Live Activity Monitor — sidebar shows CPU %, RAM, Swap, and app memory, refreshing every 2 seconds
- Hardware-aware device selection — automatically picks CUDA, Apple MPS, or CPU based on your machine
- Download results — per-page
.txtor a combined all-pages file
- Python 3.10+
- PyTorch 2.1+ (with CUDA or MPS support as applicable)
- Streamlit 1.33+ (required for
@st.fragment)
# 1. Clone or copy this project
git clone <your-repo-url>
cd glm-ocr-app
# 2. Create a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install streamlit transformers torch torchvision pillow pymupdf psutilmacOS note: If you are on Apple Silicon, install the MPS-enabled build of PyTorch from pytorch.org.
streamlit run app.pyThe app opens in your browser at http://localhost:8501.
The model (~4 GB) is downloaded from Hugging Face on first run and cached in ./models.
The sidebar shows a live hardware notice explaining exactly what your machine will use and how fast to expect results:
| Hardware | Mode | Speed estimate |
|---|---|---|
| NVIDIA / AMD GPU (CUDA) | float16 on GPU | ~5–15 sec / page |
| Apple Silicon ≥ 16 GB RAM | bfloat16 on MPS | ~10–30 sec / page |
| Apple Silicon < 16 GB RAM | float32 on CPU | ~2–5 min / page |
| Any CPU (no GPU) | float32 on CPU | ~2–8 min / page |
Why CPU on 8 GB Apple Silicon? GLM-OCR's KV cache during model.generate() needs roughly 6 GB of one contiguous Metal memory buffer. On an 8 GB M1/M2 Mac, after the OS kernel (~2 GB) and model weights (~2–3 GB) are loaded, Metal no longer has enough space for that allocation and will abort with an OOM crash. The app detects this at startup and falls back to CPU automatically — slow but stable.
app.py # Streamlit entry point (~60 lines)
glm_ocr/
├── config.py # All tuneable constants
├── hardware.py # RAM detection + device/dtype selection
├── model_loader.py # HuggingFace download check + model loading
├── pdf_utils.py # PDF bytes → list of PIL images (PyMuPDF)
├── ocr_result.py # OcrResult dataclass (no torch dependency)
├── input_builder.py # Tokeniser + GPU memory cleanup
├── inference.py # run_ocr_stream() generator + run_ocr() wrapper
└── ui/
├── styles.py # CSS injection + branded header
├── sidebar.py # Settings inputs — orchestrates device_notice + monitor
├── device_notice.py # Per-hardware capability description
├── resource_monitor.py # Live Activity Monitor (@st.fragment, 2 s refresh)
├── upload_panel.py # Left column: upload, preview, page navigation
├── result_panel.py # Right column: orchestrator
├── ocr_controls.py # Extraction mode selector, prompt editor, page range
└── ocr_runner.py # Cancellable multi-page execution + live timer
Every file is under 175 lines. No file imports from another with relative dots (from .x) — all imports are absolute, which is required for Streamlit's flat run context.
Sends a fixed prompt to the model and streams the result back as plain text.
| Mode | Prompt sent to model |
|---|---|
| Text | Text Recognition: |
| Formula | Formula Recognition: |
| Table | Table Recognition: |
Lets you define a JSON schema; the model fills in the empty string values. Built-in presets:
- Personal ID — name, date of birth, address, issue/expiry dates
- Invoice — vendor, customer, line items, totals, tax
- Receipt — store, items, subtotal, payment method
- Business Card — name, title, company, contact details
- Custom — free-edit text area with live JSON validation
The schema is compacted to a single line before being sent to the model.
model.generate() is a blocking C++ call that cannot be interrupted mid-run. Cancellation is therefore checked at two deterministic checkpoints:
- Before
generate()— the_logcallback raisesOcrCancelledErrorwhen fired for"Running model.generate()…", aborting the page before the expensive call starts. - Between pages — the multi-page loop checks the cancel flag before each subsequent page.
Pressing ⏹ Stop sets ocr_cancel_requested = True in session state. The current page will complete its generation (this cannot be avoided), but no further pages will start.
All constants are in glm_ocr/config.py:
| Constant | Default | Description |
|---|---|---|
DEFAULT_MODEL_ID |
zai-org/GLM-OCR |
HuggingFace model repo |
DEFAULT_CACHE_DIR |
./models |
Local model cache path |
DEFAULT_MAX_NEW_TOKENS |
2048 |
Generation budget per page |
MAX_MAX_NEW_TOKENS |
4096 |
Upper limit of sidebar slider |
MPS_MIN_RAM_BYTES |
16 GB |
Threshold below which MPS is skipped |
PDF_DPI_SCALE |
2.0 |
PDF render scale (2× = ~144 dpi) |
To reduce inference time on CPU, lower PDF_DPI_SCALE to 1.5 — this cuts pixel count by ~40% with minimal quality loss for most documents.
inference.py doubles as a command-line tool:
# Run OCR on a single image
python glm_ocr/inference.py path/to/image.png
# Run OCR on page 2 of a PDF
python glm_ocr/inference.py document.pdf --page 1
# Use a different model or cache location
python glm_ocr/inference.py image.png --model-id zai-org/GLM-OCR --cache-dir ./modelsModuleNotFoundError: No module named 'streamlit'
Run pip install streamlit and ensure you are in the correct virtual environment.
Model download is very slow or fails
The model is ~4 GB. Ensure you have a stable connection. Re-running streamlit run app.py will resume a partial download from the HuggingFace cache.
st.fragment causes an error
You are on Streamlit < 1.33. Run pip install --upgrade streamlit.
OCR output is truncated on dense pages Increase Max new tokens in the sidebar slider (up to 4096). The default is 2048, which covers most pages; very dense documents may need more.
Timer shows 0.0s and freezes
This is expected behaviour — the timer is driven by the _log callback, which fires before and after model.generate(). During the generation itself (which can take several minutes on CPU) the timer shows the time at the last checkpoint, not a live wall-clock tick.
- ZAI GLM-OCR — the underlying vision-language model
- Streamlit — the web framework
- HuggingFace Transformers — model loading and inference
- PyMuPDF (fitz) — fast PDF rendering