Fastener-Finder analyzes a photo of mixed hardware (screws, nuts, washers, springs, etc.), assigns a stable number to each part, and generates a labeled output image with a structured table.
The main script is hardware_test.py.
You give the script one image (hardware.jpg) with random hardware pieces on a surface.
The script:
- Finds each visible part in the image.
- Draws a red box around each part.
- Places a large ID number near each box (
1, 2, 3...). - Creates a table next to the image that lists each ID, what the part is, and a short description.
- Saves one combined output image:
output_labeled.jpg.
This makes it easy to visually match each physical part to its row in the table.
The workflow uses two model passes through Ollama:
- Detection pass:
- The model returns a JSON list of detected objects.
- Each object includes a class label and a bounding box.
- Bounding boxes are normalized and converted to pixel coordinates.
- Enrichment pass:
- The script sends the numbered image back to the model.
- The model refines the label and generates a short specification and confidence score per ID.
If the model output is noisy or malformed, the script uses robust JSON recovery logic so it can still proceed whenever possible.
hardware.jpg
|
v
[Pass 1: Detection Model via Ollama]
-> JSON detections (label + box_2d)
|
v
[Normalization + Stable ID Sorting]
-> pixel boxes + deterministic IDs
|
v
[Overlay Renderer]
-> numbered boxes on image (temporary numbered image)
|
v
[Pass 2: Enrichment Model via Ollama]
-> refined label + specification + confidence per ID
|
v
[Table Composer]
-> ID | Label | Specification | Conf | Box (px)
|
v
output_labeled.jpg
flowchart TD
A[hardware.jpg] --> B[Pass 1: Detection Model via Ollama]
B --> C[JSON detections: label + box_2d]
C --> D[Normalization + Stable ID Sorting]
D --> E[Pixel boxes + deterministic IDs]
E --> F[Overlay Renderer]
F --> G[Numbered boxes on image]
G --> H[Pass 2: Enrichment Model via Ollama]
H --> I[Refined label + specification + confidence per ID]
I --> J[Table Composer]
J --> K["Table: ID, Label, Specification, Conf, Box (px)"]
K --> L[output_labeled.jpg]
- Input image:
hardware.jpg - Script:
hardware_test.py - Output image:
output_labeled.jpg
The output table currently includes:
ID: Number shown on the imageLabel: Fastener type from model/classifierSpecification: Human-readable details (or heuristic fallback)Conf: Confidence score (0.00 to 1.00)Box (px): Bounding box coordinates in pixel space
- Python 3.10+
- Ollama installed and running
- A vision-capable model available in Ollama (configured in script as
qwen3-vl) - Python packages:
ollamaPillow
Install dependencies:
pip install ollama pillowFrom the project folder:
python hardware_test.pyAfter completion, open:
output_labeled.jpg
- The model may occasionally miss small parts or merge nearby parts.
- Part “specification” text is inferred from appearance and may be approximate.
- Confidence values come from model self-assessment and should be treated as guidance, not ground truth.
- If no output appears:
- Confirm
hardware.jpgexists in the repository root. - Confirm Ollama service is running.
- Confirm the selected model name exists locally.
- Confirm
- If boxes look misplaced:
- Ensure the model returns boxes in expected XYXY format.
- Review logged
Model Responsein terminal for malformed coordinates.
- Ollama documentation: https://github.com/ollama/ollama
- Pillow documentation: https://pillow.readthedocs.io/