Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
806d199
Initial plan
Copilot Nov 11, 2025
a52aca3
Add agent chat entry for tesseract-nanobind task
Copilot Nov 11, 2025
fe9c3d4
Add initial tesseract_nanobind implementation with basic tests
Copilot Nov 11, 2025
f17b9e5
Add advanced features, benchmarks, and justfile recipes for tesseract…
Copilot Nov 11, 2025
adc3738
Final documentation, examples, and cleanup for tesseract_nanobind
Copilot Nov 11, 2025
d727696
Add comprehensive test coverage matching pytesseract and tesserocr tests
Copilot Nov 11, 2025
dc082fa
Improve benchmark realism with real test images and add verification …
Copilot Nov 11, 2025
bd82af9
Add tesserocr API compatibility layer and comprehensive benchmark
Copilot Nov 11, 2025
15bf290
Add GitHub Actions workflows for CI/CD and wheel building
Copilot Nov 11, 2025
570f932
Add Python 3.13 and 3.14 support to CI/CD workflows
Copilot Nov 11, 2025
05b4ff0
wip on mac
hironow Nov 11, 2025
093377a
before 35% ->
hironow Nov 11, 2025
5450c27
phase 1-3
hironow Nov 11, 2025
a5da060
just -check
hironow Nov 11, 2025
ec8173c
use just on gha
hironow Nov 11, 2025
ed4fceb
cc
hironow Nov 11, 2025
436ffee
versioning
hironow Nov 11, 2025
138576d
up vers
hironow Nov 11, 2025
608fd64
Fix tesseract-build for CI environments without virtual environment
hironow Nov 11, 2025
7fc757e
Fix test command to install test dependencies in CI
hironow Nov 11, 2025
1829d5e
fix ci
hironow Nov 11, 2025
6bfec89
Reorganize documentation for better user experience
hironow Nov 11, 2025
82e301b
Fix benchmark tessdata path for cross-platform compatibility
hironow Nov 11, 2025
0600195
Address PR review feedback: improve error handling and debugging
hironow Nov 11, 2025
d27b25e
Optimize GetThresholdedImage pixel copy for better performance
hironow Nov 11, 2025
adfdb91
Rewrite README for clarity and updated benchmarks
hironow Nov 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .claude/settings.local.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"permissions": {
"allow": [
"Bash(uv:*)",
"Bash(brew --prefix)",
"Bash(brew list:*)",
"Bash(pkg-config:*)",
"Bash(brew install:*)",
"Bash(test:*)",
"Bash(just:*)",
"Bash(git restore:*)",
"WebSearch",
"WebFetch(domain:github.com)",
"WebFetch(domain:raw.githubusercontent.com)",
"Bash(brew --prefix:*)",
"Bash(find:*)",
"Bash(head:*)",
"Bash(done)",
"Bash(git mv:*)"
],
"deny": [
"Bash(sudo:*)",
"Bash(rm -rf:*)",
"Bash(npm:*)",
"Bash(npx:*)",
"Bash(python3:*)",
"Bash(pip3:*)",
"Bash(pip:*)",
"Bash(git push:*)",
"Read(.env.keys)",
"Read(id_rsa)",
"Read(id_ed25519)",
"Read(**/*token*)",
"Read(**/*key*)",
"Read(**/private/**)",
"Write(.env.keys)",
"Write(**/secrets/**)",
"Write(**/private/**)",
"Bash(wget:*)",
"Bash(psql:*)",
"Bash(mysql:*)",
"Bash(mongod:*)"
],
"ask": [
"Bash(rm -f:*)"
]
}
}
170 changes: 170 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# GitHub Actions Workflows

This directory contains GitHub Actions workflows for the Tesseract Nanobind project.

## Workflows

### 1. Tesseract Nanobind CI (`tesseract-nanobind-ci.yaml`)

**Purpose**: Continuous Integration for build, test, and code quality checks.

**Triggers**:
- Push to `main` or `develop` branches (when tesseract_nanobind_benchmark files change)
- Pull requests to `main` or `develop` branches
- Manual dispatch

**Jobs**:

#### build-and-test
- **Matrix**: Tests on Ubuntu and macOS with Python 3.8-3.14
- **Steps**:
1. Checkout repository with submodules
2. Install system dependencies (Tesseract, Leptonica, CMake)
3. Install Python dependencies
4. Build the package
5. Run test suite with coverage
6. Upload coverage to Codecov (Ubuntu + Python 3.11 only)

#### compatibility-test
- **Purpose**: Verify tesserocr API compatibility
- **Platform**: Ubuntu with Python 3.11
- **Steps**:
1. Install tesserocr alongside tesseract_nanobind
2. Run compatibility tests to ensure drop-in replacement works

#### benchmark
- **Purpose**: Performance comparison against pytesseract and tesserocr
- **Triggers**: Only on pull requests or manual dispatch
- **Platform**: Ubuntu with Python 3.11
- **Steps**:
1. Install all three implementations (pytesseract, tesserocr, tesseract_nanobind)
2. Initialize test image submodules
3. Run comprehensive benchmark comparing all three
4. Upload benchmark results as artifact

#### code-quality
- **Purpose**: Code quality checks with ruff
- **Platform**: Ubuntu with Python 3.11
- **Steps**:
1. Run ruff linter
2. Check code formatting

### 2. Build Wheels (`tesseract-nanobind-build-wheels.yaml`)

**Purpose**: Build distributable wheels for multiple platforms.

**Triggers**:
- Push tags matching `tesseract-nanobind-v*`
- Manual dispatch

**Jobs**:

#### build_wheels
- **Matrix**: Build on Ubuntu and macOS
- **Uses**: cibuildwheel for building wheels
- **Platforms**:
- Linux: x86_64 (Python 3.8-3.14)
- macOS: x86_64 and arm64 (Python 3.8-3.14)
- **Output**: Wheels for each platform uploaded as artifacts

#### build_sdist
- **Purpose**: Build source distribution
- **Platform**: Ubuntu
- **Output**: Source tarball uploaded as artifact

#### release
- **Purpose**: Create GitHub release with built wheels
- **Triggers**: Only on tag push
- **Steps**:
1. Download all wheel and sdist artifacts
2. Create GitHub release with all distribution files

## Usage

### Running CI Locally

To test the build and test process locally before pushing:

```bash
# Navigate to the project directory
cd tesseract_nanobind_benchmark

# Install dependencies
pip install -e .

# Run tests
pytest tests/ -v

# Run benchmarks
python benchmarks/compare_all.py
```

### Triggering Manual Workflows

1. Go to the Actions tab in GitHub
2. Select the workflow (e.g., "Tesseract Nanobind CI")
3. Click "Run workflow"
4. Select the branch and click "Run workflow"

### Creating a Release

To create a release with built wheels:

```bash
# Tag the release
git tag tesseract-nanobind-v0.1.0
git push origin tesseract-nanobind-v0.1.0
```

This will automatically trigger the wheel building workflow and create a GitHub release.

## Badges

Add these badges to your README.md:

```markdown
[![Tesseract Nanobind CI](https://github.com/hironow/Coders/actions/workflows/tesseract-nanobind-ci.yaml/badge.svg)](https://github.com/hironow/Coders/actions/workflows/tesseract-nanobind-ci.yaml)
[![Build Wheels](https://github.com/hironow/Coders/actions/workflows/tesseract-nanobind-build-wheels.yaml/badge.svg)](https://github.com/hironow/Coders/actions/workflows/tesseract-nanobind-build-wheels.yaml)
```

## Dependencies

### System Dependencies
- **Tesseract OCR**: OCR engine
- **Leptonica**: Image processing library
- **CMake**: Build system
- **pkg-config**: Library configuration

### Python Dependencies
- **pytest**: Testing framework
- **pillow**: Image processing
- **numpy**: Array operations
- **pytesseract**: (benchmark only)
- **tesserocr**: (compatibility test and benchmark only)

## Troubleshooting

### Build Failures

If builds fail due to missing dependencies:

1. **Ubuntu**: Ensure `tesseract-ocr`, `libtesseract-dev`, and `libleptonica-dev` are installed
2. **macOS**: Ensure `tesseract` and `leptonica` are installed via Homebrew
3. **CMake**: Verify CMake >= 3.15 is available

### Test Failures

If tests fail:

1. Check that all dependencies are installed correctly
2. Verify Tesseract language data is available (eng.traineddata)
3. Review test output for specific failure reasons

### Coverage Upload

Coverage is only uploaded from:
- Ubuntu latest
- Python 3.11
- Main CI workflow

If coverage upload fails, it won't fail the entire CI run (set to non-blocking).
114 changes: 114 additions & 0 deletions .github/workflows/tesseract-nanobind-build-wheels.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
name: Build Wheels

on:
push:
tags:
- 'tesseract-nanobind-v*'
workflow_dispatch:

jobs:
build_wheels:
name: Build wheels on ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest]

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install system dependencies (Ubuntu)
if: runner.os == 'Linux'
run: |
sudo apt-get update
sudo apt-get install -y \
tesseract-ocr \
libtesseract-dev \
libleptonica-dev \
pkg-config \
cmake \
ninja-build

- name: Install system dependencies (macOS)
if: runner.os == 'macOS'
run: |
brew install tesseract leptonica pkg-config cmake ninja

- name: Build wheels
uses: pypa/cibuildwheel@v2.16.5
env:
CIBW_BUILD: cp310-* cp311-* cp312-* cp313-* cp314-*
CIBW_SKIP: "*-musllinux_* *-manylinux_i686 *-win32"
CIBW_ARCHS_LINUX: x86_64
CIBW_ARCHS_MACOS: x86_64 arm64
CIBW_BEFORE_BUILD_LINUX: |
yum install -y tesseract-devel leptonica-devel || \
apt-get update && apt-get install -y libtesseract-dev libleptonica-dev
CIBW_BEFORE_BUILD_MACOS: |
brew install tesseract leptonica
CIBW_TEST_REQUIRES: pytest>=9.0 pillow>=12.0 numpy>=2.0
CIBW_TEST_COMMAND: pytest {project}/tesseract_nanobind_benchmark/tests/test_basic.py -v
with:
package-dir: ./tesseract_nanobind_benchmark

- name: Upload wheels
uses: actions/upload-artifact@v4
with:
name: wheels-${{ matrix.os }}
path: ./wheelhouse/*.whl

build_sdist:
name: Build source distribution
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
submodules: recursive

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Build sdist
working-directory: tesseract_nanobind_benchmark
run: |
python -m pip install --upgrade pip build
python -m build --sdist

- name: Upload sdist
uses: actions/upload-artifact@v4
with:
name: sdist
path: tesseract_nanobind_benchmark/dist/*.tar.gz

release:
name: Create GitHub Release
needs: [build_wheels, build_sdist]
runs-on: ubuntu-latest
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')

steps:
- name: Download artifacts
uses: actions/download-artifact@v4
with:
path: dist

- name: Create Release
uses: softprops/action-gh-release@v1
with:
files: dist/**/*
draft: false
prerelease: false
generate_release_notes: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Loading
Loading