A modern video OCR application with GLM-OCR support, featuring CLI, Web GUI, and Desktop interfaces
📥 Download Latest Release • Features • Quick Start • Installation • Usage • Documentation
VideOCR-GLM is a powerful video OCR (Optical Character Recognition) application that extracts hardcoded (burned-in) subtitles from videos using GLM-OCR technology. It provides three flexible interfaces to suit different use cases:
- Web GUI: Modern Vue 3 interface for easy visual operation
- Desktop App: Click-to-run Electron application for standalone use
- CLI: Command-line tool for automation and batch processing Perfect for content creators, researchers, and anyone who needs to extract subtitles from videos in over 100 languages.
graph TB
subgraph "Desktop App (Electron)"
FE[Frontend<br/>Vue 3 + TypeScript]
BE[Backend<br/>Node.js + Express]
end
subgraph "CLI (Python)"
CLI[VideOCR-GLM-CLI<br/>Python 3]
end
subgraph "OCR Engine"
OLLAMA[Ollama<br/>GLM-OCR Model]
end
FE -->|HTTP/WebSocket| BE
BE -->|Child Process| CLI
CLI -->|API Request| OLLAMA
CLI -->|Video Processing| CLI
style FE fill:#42b883
style BE fill:#68a063
style CLI fill:#3776ab
style OLLAMA fill:#fca311
Architecture Overview:
- Frontend (Vue 3): Modern web interface for video preview, crop zone selection, and queue management
- Backend (Node.js + Express): RESTful API server with WebSocket support for real-time progress updates
- CLI (Python): Video processing engine that extracts subtitles using GLM-OCR
- Ollama: Local AI runtime hosting the GLM-OCR model for optical character recognition
- Electron: Desktop application wrapper that packages frontend and backend into a standalone app Data Flow:
- User uploads video and configures settings in the Frontend
- Frontend sends request to Backend via HTTP/WebSocket
- Backend spawns CLI process with video and configuration
- CLI processes video frames and sends them to Ollama for OCR
- Ollama returns recognized text to CLI
- CLI generates SRT file and streams progress to Backend
- Backend broadcasts progress to Frontend via WebSocket
- Frontend displays real-time progress and results to user
- Interactive Video Preview - Preview videos with playback controls
- Visual Crop Zone Selection - Click and drag to select subtitle regions
- Dual Zone OCR - Support for two separate subtitle regions
- Preset Crop Positions - Quick selection of top, center, or bottom positions
- Full Frame OCR - Option to use entire frame for OCR
- Batch Processing - Add multiple videos to a processing queue
- Parallel Processing - Process multiple videos simultaneously (1-10 workers)
- Real-time Progress - Visual progress bars for each video
- Queue Persistence - Queue items persist across browser sessions
- Retry Failed Items - Easily retry failed processing attempts
- 100+ Languages - Support for multiple OCR languages
- OCR Parameters - Fine-tune similarity threshold, SSIM threshold, frame skipping
- Image Processing - Brightness threshold, full frame options
- Ollama Integration - Configure host, port, model, and timeout
- Settings Management - Save, load, and export settings
- Time Range Selection - Process specific video segments
- Responsive Design - Works on desktop and tablet devices
- Dark/Light Theme - Modern UI with Ant Design Vue components
- Real-time Updates - WebSocket-based progress updates
- Error Handling - Clear error messages and troubleshooting guidance
Video preview with interactive crop zone selection, dual zone support, and comprehensive settings configuration.
Complete settings management with OCR parameters, image processing options, Ollama configuration, and system settings.
Real-time queue management with parallel processing controls, progress visualization, and error handling.
Prerequisites:
- Ollama installed - Visit https://ollama.ai for installation instructions
- GLM-OCR model - Run:
ollama pull glm-ocr:latest
Instructions:
- Download the installer from Releases
- Run
VideOCR-GLM Setup 1.0.0.exe - Launch the app and start extracting subtitles!
✅ No need to install Node.js or Python - everything is bundled in the installer.
Prerequisites:
- Ollama installed - Visit https://ollama.ai for installation instructions
- GLM-OCR model - Run:
ollama pull glm-ocr:latest - Node.js 18 or higher
- Python 3.8 or higher
Instructions:
# Install dependencies
npm install
# Start development servers (frontend + backend)
npm run dev:allThe GUI will be available at http://localhost:3000
Prerequisites:
- Ollama installed - Visit https://ollama.ai for installation instructions
- GLM-OCR model - Run:
ollama pull glm-ocr:latest - Python 3.8 or higher
Instructions: See VideOCR-GLM-CLI/README.md for CLI usage and documentation.
CLI Repository: https://github.com/Benson-mk/VideOCR-GLM-CLI
| Component | Minimum Version | Recommended |
|---|---|---|
| Node.js | 18.0.0 | 20.x LTS |
| Python | 3.8 | 3.10+ |
| Ollama | Latest | Latest |
| RAM | 8 GB | 16 GB |
| Disk Space | 500 MB | 2 GB+ |
git clone https://github.com/Benson-mk/VideOCR-GLM.git
cd VideOCR-GLM# Install all workspace dependencies (frontend + backend)
npm install# Install Ollama (if not already installed)
# Visit https://ollama.ai for installation instructions
# Pull the GLM-OCR model
ollama pull glm-ocr:latest
# Verify installation
ollama listCreate a .env file in the root directory:
# Frontend Configuration
VITE_API_BASE_URL=http://localhost:3000
# Backend Configuration
BACKEND_PORT=5001
# Ollama Configuration
VITE_OLLAMA_HOST=localhost
VITE_OLLAMA_PORT=11434Important Ports:
| Service | Port | Description |
|---|---|---|
| Frontend | 3000 | Web GUI (Vite dev server) |
| Backend | 5001 | API server (Express) |
| Ollama | 11434 | OCR model service |
# Check Node.js version
node --version # Should be >= 18.0.0
# Check Python version
python --version # Should be >= 3.8
# Check Ollama status
ollama list # Should show glm-ocr model
# Test backend connection
curl http://localhost:5001/api/health- Click "Select Video File" and choose a video
- Configure general settings:
- Language selection (100+ languages supported)
- Output directory for SRT files
- Time range (start and end times)
- Select crop zone by clicking and dragging on the video preview
- Enable "Dual Zone" for videos with subtitles in multiple positions
- Click "Add to Queue"
Navigate to Settings page to configure: General Settings
- Language selection
- Output directory
- Time range OCR Settings
- Similarity threshold (default: 80)
- Max merge gap (default: 0.09)
- SSIM threshold (default: 92)
- Frames to skip (default: 1)
- Post-processing options
- Minimum subtitle duration (default: 0.2)
- OCR image max width (default: 960) Image Processing
- Brightness threshold
- Use full frame option
- Subtitle position (top/center/bottom) Ollama Settings
- Host (default: localhost)
- Port (default: 11434)
- Model (default: glm-ocr:latest)
- Timeout (default: 300 seconds) System Settings
- Allow system sleep during processing
- Parallel workers (1-10)
- Navigate to Queue view
- Set parallel workers (1-10)
- Click "Start Processing"
- Monitor progress in real-time
- Clear completed/failed items
- Retry failed items
- Remove items from queue
- Download generated SRT files
After building the Electron app:
- Run the installer
- The app will launch automatically
- Use the same interface as the Web GUI
- All features work offline (except Ollama connection)
For command-line usage, see VideOCR-GLM-CLI/README.md.
VideOCR-GLM/
├── frontend/ # Vue 3 web interface
│ ├── src/ # Source code
│ │ ├── components/ # Vue components
│ │ ├── constants/ # Constants and enums
│ │ ├── hooks/ # Custom hooks
│ │ ├── router/ # Vue Router configuration
│ │ ├── services/ # External services
│ │ ├── stores/ # Pinia stores
│ │ ├── styles/ # Global styles
│ │ ├── types/ # TypeScript types
│ │ ├── utils/ # Utility functions
│ │ └── views/ # Page components
│ ├── public/ # Static assets
│ ├── package.json # Frontend dependencies
│ └── vite.config.ts # Vite configuration
├── backend/ # Node.js/Express API
│ ├── src/ # Source code
│ │ ├── services/ # Backend services
│ │ └── types/ # TypeScript types
│ ├── uploads/ # Uploaded video files
│ ├── package.json # Backend dependencies
│ └── tsconfig.json # TypeScript configuration
├── electron/ # Electron desktop app
│ ├── main.cjs # Main process
│ └── preload.js # Preload script
├── VideOCR-GLM-CLI/ # Python CLI tool
│ ├── videocr/ # CLI source code
│ ├── tests/ # CLI tests
│ └── requirements.txt # Python dependencies
├── scripts/ # Build and utility scripts
├── config/ # Shared configuration files
├── assets/ # Application assets
├── image/ # Screenshots
└── package.json # Workspace root configuration
- Framework: Vue 3 (Composition API)
- Language: TypeScript
- Build Tool: Vite
- UI Library: Ant Design Vue
- State Management: Pinia
- Routing: Vue Router
- Styling: Less
- Runtime: Node.js
- Framework: Express
- Language: TypeScript
- Real-time: WebSocket
- Process Management: Child Process
- Framework: Electron
- Packaging: Electron Builder
- Icon Management: rcedit
- Engine: GLM-OCR
- Runtime: Ollama
- Languages: 100+ supported
- Language: Python 3
- Packaging: PyInstaller
- Video Processing: PyAV
Default settings are defined in frontend/src/stores/settings.ts:
const defaultGeneralSettings: GeneralSettings = {
lang: 'en',
output_dir: '',
time_start: '0:00',
time_end: '',
}
const defaultAdvancedSettings: AdvancedSettings = {
use_dual_zone: false,
crop_zones: [],
ocr: {
sim_threshold: 80,
max_merge_gap: 0.09,
ssim_threshold: 92,
frames_to_skip: 1,
post_processing: false,
min_subtitle_duration: 0.2,
ocr_image_max_width: 960,
},
image_processing: {
brightness_threshold: null,
use_fullframe: false,
subtitle_position: 'center',
},
ollama: {
host: 'localhost',
port: 11434,
model: 'glm-ocr:latest',
timeout: 300,
},
system: {
allow_system_sleep: false,
parallel_workers: 1,
},
}Create a .env file in the root directory:
# API Configuration
VITE_API_BASE_URL=http://localhost:3000
# Backend Port
BACKEND_PORT=5001
# Ollama Configuration
VITE_OLLAMA_HOST=localhost
VITE_OLLAMA_PORT=11434From the root directory:
# Development
npm run dev # Run frontend only
npm run dev:backend # Run backend only
npm run dev:all # Run both frontend and backend
npm run dev:electron # Run Electron in development mode
# Building
npm run build # Build frontend
npm run build:backend # Build backend
npm run build:all # Build both
npm run build:electron # Build Electron app
# Other
npm run lint # Lint code
npm run format # Format code
npm run clean # Clean build artifacts
npm run preview # Preview production buildYou can also work directly in each workspace:
cd frontend
npm run dev # Start development server
npm run build # Build for production
npm run test # Run testscd backend
npm run dev # Start development server
npm run build # Build for production- Use TypeScript for type safety
- Follow Vue 3 Composition API patterns
- Use Pinia for state management
- Follow Ant Design Vue component patterns
- Use Less for styling
- Create TypeScript types in
frontend/src/types/ - Add store logic in
frontend/src/stores/ - Create components in
frontend/src/components/orfrontend/src/views/ - Add routes in
frontend/src/router/routes/ - Update navigation in
frontend/src/App.vue
- Ensure the video file format is supported by your browser (MP4, WebM)
- Check browser console for errors (F12)
- Try a different video format
- Verify the video file is not corrupted
- Ensure Python is installed and accessible:
python --version - Verify the CLI script exists at
VideOCR-GLM-CLI/videocr_glm_cli.py - Check that Ollama is running:
ollama list - Verify the GLM-OCR model is available:
ollama pull glm-ocr:latest - Review error messages in the queue view
- Check that the CLI is outputting progress information
- Verify the progress regex pattern matches CLI output
- Check browser console for JavaScript errors
- Ensure WebSocket connection is established
- Check if port 5001 is available
- Verify backend files exist in
backend/dist/ - Check console logs for error messages
- Ensure all dependencies are installed:
npm install
- Verify backend is running on port 5001
- Check if frontend files exist in
frontend/dist/ - Review Electron console logs for errors
- Ensure all dependencies are bundled correctly
- Verify Ollama is running:
ollama list - Check Ollama host and port settings
- Ensure GLM-OCR model is downloaded:
ollama pull glm-ocr:latest - Test Ollama connection:
curl http://localhost:11434/api/tags
- Ensure Node.js version is 18 or higher:
node --version - Clear node_modules and reinstall:
rm -rf node_modules && npm install - Check disk space (need ~500 MB free for Electron build)
- Verify all dependencies are installed
- CLI Documentation: VideOCR-GLM-CLI/README.md - Command-line usage and API reference
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions:
- Check the Troubleshooting section
- Search existing GitHub Issues
- Create a new issue with:
- Clear description of the problem
- Steps to reproduce
- Expected vs actual behavior
- Environment details (OS, Node.js version, etc.)
- Error messages or screenshots
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and write tests
- Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- Be respectful and inclusive
- Provide constructive feedback
- Follow the existing code style
- Write clear commit messages
- Update documentation as needed
- Built with Vue 3
- UI components from Ant Design Vue
- State management with Pinia
- Powered by GLM-OCR
- Desktop packaging with Electron
- Video processing with PyAV
- Original VideOCR-GLM-CLI project structure and implementation from timminator/VideOCR
