A Python tool that watches your Greenshot folder and automatically generates step-by-step SOP documentation with manual captions (AI auto-caption support planned for Azure integration).
⚠️ AI Caption Status: Currently using manual placeholder captions. Azure OpenAI Service (GPT-4 Vision) or Azure AI Vision integration is planned for automatic step description generation.
1. You work through a process, taking screenshots with Greenshot (hotkeys)
2. Stop for 5 seconds → Script detects idle, prompts for title
3. Script generates Markdown SOP with:
- Numbered steps
- AI-generated captions (via local LLM) or manual placeholders
- Embedded image references
- Copied to clipboard for instant paste into Halo/ITGlue
- Python 3.8+
- Greenshot configured to save to a known folder
- Ditto (optional but recommended) - for clipboard history
AI-generated captions are disabled by default. Two implementation paths are planned:
-
Azure OpenAI Service (recommended for production)
- GPT-4 Vision / GPT-4o for screenshot understanding
- Azure AD authentication, Key Vault for secrets
-
Local Ollama (development/testing only)
- Requires local GPU or strong CPU
- Pull a vision model:
ollama pull llava - Set
use_llm: Truein CONFIG
See Azure AI Roadmap for implementation notes.
# Clone or download sop_generator.py
git clone <repo> # or just download the file
cd sop_generator
# Install dependencies
pip install requests
# Configure
# Edit sop_generator.py CONFIG dict or create config.jsonEdit the CONFIG dict at the top of sop_generator.py:
CONFIG = {
"greenshot_folder": Path("C:/Users/You/Pictures/Greenshot"), # Your Greenshot output folder
"output_folder": Path.home() / "Documents" / "SOPs", # Where SOPs are saved
"idle_timeout": 5, # Seconds to wait after last screenshot
"use_llm": False, # Set True for Ollama local LLM (Azure integration TBD)
# Azure AI integration - see roadmap below
}Continuously watches for new screenshots:
python sop_generator.py --watch- Work through your process, hitting Greenshot hotkeys
- Pause for 5 seconds
- Script prompts: "Enter SOP title:"
- Markdown document generated, copied to clipboard
- Paste directly into Halo KB or ITGlue
python sop_generator.py --process-existingpython sop_generator.py --caption "path/to/screenshot.png"python sop_generator.py --watch --no-llmGenerated Markdown looks like:
# How to Configure MFA in Azure AD
**Created:** 2025-01-14 14:32
**Screenshots:** 5
---
## Step 1
**Action:** [TODO: Describe this step - e.g., 'Click the Settings button'] (ref: 2025-01-14_14-30-22.png)

<!-- Additional notes: -->
---
## Step 2
**Action:** [TODO: Describe this step - e.g., 'Click the Settings button'] (ref: 2025-01-14_14-31-15.png)

<!-- Additional notes: -->
---- Generate SOP with the script
- Copy is automatically in clipboard
- Paste into Halo KB article (Markdown supported)
- Upload images from
assets/folder to Halo - Or convert to HTML for better control
- Generate SOP
- ITGlue accepts direct image paste - open the assets folder
- Drag images into ITGlue editor
- Copy the text content manually
- Configure Ditto to keep 50+ clipboard items
- Your last 50 screenshots are always available
- Can re-paste if you accidentally overwrite clipboard
Configure Greenshot filename pattern:
${capturetime:d"yyyy-MM-dd_HH-mm-ss"}for timestamps- Makes chronological sorting automatic
If you have folders of existing screenshots:
# Add to sop_generator.py or run separately
import os
from pathlib import Path
folders = Path("C:/OldDocs").glob("*")
for folder in folders:
images = sorted(folder.glob("*.png"))
if images:
# Process each folder as one SOP
passThis is expected behavior - AI captions are disabled by default. Options:
- Fill in descriptions manually (paste into Halo/ITGlue, then edit)
- Enable local Ollama: Set
use_llm: Truein CONFIG (requires local GPU) - Wait for Azure integration (roadmap item)
- Ensure Ollama is running:
ollama serve - Test:
curl http://localhost:11434/api/tags - Try a smaller model if llava is slow:
ollama pull llava-phi3
- Verify
greenshot_folderpath exists - Check Greenshot output filename pattern matches
filename_patternin config
- Ensure you're using the full asset path or uploading images to your PSA
- Consider using base64 embedded images (add
--embed-base64flag if implemented)
Planned Azure integration for automatic screenshot captioning:
| Service | Use Case | Pros | Cons |
|---|---|---|---|
| Azure OpenAI Service (GPT-4 Vision) | Generate natural language descriptions of UI actions | Best accuracy, understands context | Higher cost, requires careful prompt engineering |
| Azure AI Vision (Image Analysis 4.0) | Dense captioning, OCR for text extraction | Purpose-built for images, cheaper | Less contextual understanding |
| Azure Document Intelligence | Structured data extraction from forms | Not applicable here | Overkill for UI screenshots |
# TODO: Add to sop_generator.py - Azure OpenAI Vision integration
def _caption_with_azure_openai(self, image_path: Path) -> str:
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential
client = AzureOpenAI(
azure_endpoint="https://YOUR_RESOURCE.openai.azure.com/",
credential=DefaultAzureCredential(), # Managed Identity
api_version="2024-02-15-preview"
)
# Encode image
with open(image_path, "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="gpt-4-vision", # Deployment name
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_b64}"}},
{"type": "text", "text": "Describe the UI action in one sentence for IT documentation"}
]
}]
)
return response.choices[0].message.content- Authentication: DefaultAzureCredential (development) → Managed Identity (production)
- Key Storage: Azure Key Vault for API keys / endpoint URLs
- Cost Control: Implement caching (hash image → store caption) to avoid re-processing
- Fallback: Keep manual placeholder if Azure call fails or quota exceeded
Possible additions:
- Core Greenshot → Markdown workflow
- Azure OpenAI Service integration (GPT-4 Vision) for auto-captions
- Azure Key Vault integration for secrets management
- Direct Halo API integration (auto-upload)
- Windows notification toast when SOP is ready
- GUI version with preview
- Template system for different doc types (runbook vs SOP vs KB)
MIT - Use freely in your MSP/IT operations