Serverless OCR API powered by DeepSeek-OCR (3B parameters) deployed on Modal.
- OpenAI-compatible vision API endpoints
- Bearer token authentication
- Base64 and URL image inputs
- Optional bounding box visualization
- Modal Volume caching for fast cold starts
- Serverless GPU inference
pip install modal
modal setupmodal secret create neo_api_key NEO_API_KEY=your-secret-keymodal run deepseek_ocr.py::downloadmodal deploy deepseek_ocr.pyDirect OCR endpoint for text extraction.
curl -X POST https://your-workspace--deepseek-ocr-fastapi-service.modal.run/v1/ocr \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"image": "data:image/jpeg;base64,...",
"image_type": "base64",
"return_image": false
}'Response:
{
"extracted_text": "...",
"result_image_base64": "...", // optional, if return_image=true
"usage": {"prompt_tokens": 6, "completion_tokens": 150, "total_tokens": 156}
}OpenAI vision API compatible endpoint.
curl -X POST https://your-workspace--deepseek-ocr-fastapi-service.modal.run/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/deepseek-ocr",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Extract text from this image"},
{"type": "image_url", "image_url": {"url": "https://example.image.com"}}
]
}
],
"return_image": false
}'Response:
{
"id": "chatcmpl-1699876543",
"object": "chat.completion",
"created": 1699876543,
"model": "unsloth/deepseek-ocr",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "..."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 6, "completion_tokens": 150, "total_tokens": 156},
"system_fingerprint": null,
"result_image_base64": "..." // optional, if return_image=true
}OpenAI Python SDK Example:
from openai import OpenAI
client = OpenAI(
base_url="https://your-workspace--deepseek-ocr-fastapi-service.modal.run/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="deepseek-ocr",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Extract all text from this image"},
{"type": "image_url", "image_url": {"url": "https://example.image.com"}}
]
}
],
stream=False # Important: streaming not yet supported
)
print(response.choices[0].message.content)Health check endpoint.
import requests
import base64
# Read image from local file
with open("document.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
# OCR request
response = requests.post(
"https://your-workspace--deepseek-ocr-fastapi-service.modal.run/v1/ocr",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"image": f"data:image/jpeg;base64,{image_b64}",
"image_type": "base64",
"return_image": True # Optional: get bounding boxes visualization
}
)
result = response.json()
print(result["extracted_text"])
# Save result image with bounding boxes (optional)
if "result_image_base64" in result:
with open("result_with_boxes.jpg", "wb") as f:
f.write(base64.b64decode(result["result_image_base64"]))from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://your-workspace--deepseek-ocr-fastapi-service.modal.run"
)
response = client.chat.completions.create(
model="deepseek-ocr",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Extract text from this image"},
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}
]
}
],
extra_body={"return_image": True} # Optional
)
print(response.choices[0].message.content)Edit ocr_config.py:
MODEL_NAME = "unsloth/DeepSeek-OCR"
MAX_SEQ_LENGTH = 8192
LOAD_IN_4BIT = False
GPU_TYPE = "H100"
SCALEDOWN_WINDOW = 300 # seconds.
├── deepseek_ocr.py # Main app with Modal decorators & FastAPI endpoints
├── ocr_config.py # Configuration & dependencies
└── ocr_utils.py # Image processing utilities
Modal provides true serverless GPU inference - you only pay for actual GPU time, not idle containers. With volume caching, and the service scales to zero when not in use. Perfect for OCR workloads with unpredictable traffic patterns.
Yes! The /v1/chat/completions endpoint is fully compatible with OpenAI's Python SDK. Just set stream=False in your request (streaming not yet supported). See the API reference above for example code.
Unsloth optimizes model loading and inference performance. DeepSeek OCR through Unsloth's FastVisionModel achieves <1 second inference time while maintaining 97% accuracy. The framework handles quantization and memory optimization automatically.
- Document digitization
- Invoice and receipt processing
- Form data extraction
- Table extraction
- Handwriting recognition
- ID verification
Apache License. See individual model licenses on HuggingFace.