Lightweight OCR text extraction — a Python library that extracts text from images using pixel analysis and pattern matching for printed characters. No ML models needed.
graph TD
A[Input Image] --> B[ImageLoader]
B --> C[Preprocessor]
C --> D[Binarizer]
D --> E[Line Segmenter]
E --> F[Character Segmenter]
F --> G[Pattern Matcher]
G --> H[Text Output]
subgraph OCRLite Pipeline
B
C
D
E
F
G
end
style A fill:#e1f5fe
style H fill:#e8f5e9
classDiagram
class OCRLite {
+OCRConfig config
+load_image(path) NDArray
+preprocess(image) NDArray
+binarize(image, threshold) NDArray
+segment_lines(image) list~NDArray~
+segment_characters(image) list~NDArray~
+extract_text(image) OCRResult
+detect_orientation(image) int
+get_confidence(result) float
+export(result, format) str
}
class OCRConfig {
+int default_threshold
+int min_line_height
+int min_char_width
+float confidence_threshold
+bool auto_orient
}
class OCRResult {
+str text
+float confidence
+int orientation
+list~LineResult~ lines
}
OCRLite --> OCRConfig
OCRLite --> OCRResult
pip install -e .from ocrlite import OCRLite
ocr = OCRLite()
# Extract text from an image
result = ocr.extract_text(ocr.load_image("document.png"))
print(result.text)
print(f"Confidence: {ocr.get_confidence(result):.2%}")
# Export as JSON
json_output = ocr.export(result, format="json")from ocrlite.config import OCRConfig
config = OCRConfig(
default_threshold=128,
min_line_height=10,
min_char_width=5,
confidence_threshold=0.5,
auto_orient=True,
)
ocr = OCRLite(config=config)python -m ocrlite extract document.png
python -m ocrlite extract document.png --format json- No ML models — pure pixel analysis and pattern matching
- Binarization — adaptive and global thresholding via numpy
- Line segmentation — horizontal projection-based splitting
- Character segmentation — connected component analysis
- Orientation detection — automatic rotation correction
- Multiple export formats — plain text, JSON, CSV
- Configurable — full control over thresholds and parameters via Pydantic models
make install # Install dependencies
make test # Run tests
make lint # Run linter
make format # Format codeInspired by OCR and document AI trends
Built by Officethree Technologies | Made with love and AI