An object detection system that identifies and classifies welding joints and weld-related regions in photographs using YOLOv8, with a web application interface for easy use.
Ideal target (5 joint types): Butt, T-Joint, Lap, Corner, Edge. In practice, public datasets often provide seam or defect/quality labels. This repo supports:
- Training on merged Roboflow datasets (9 classes: Good/Bad Welding, Crack, Porosity, Spatters, seam, defect, flame, etc.).
- Optional joint-type classifier so that when the detector only has a “seam” class, the app can show Butt / T-Joint / Lap.
Reference (5 joint types):
| ID | Joint Type | Description |
|---|---|---|
| 0 | Butt Joint | Two pieces end-to-end in the same plane |
| 1 | T-Joint | Perpendicular pieces forming a T shape |
| 2 | Lap Joint | Overlapping pieces welded at edges |
| 3 | Corner Joint | Pieces meeting at a corner (L shape) |
| 4 | Edge Joint | Parallel edges aligned and welded |
Documentation: See docs/PROJECT_SUMMARY.md for a full description of everything implemented (data, training, web app) and docs/README.md for the docs index.
Joint_Classifier/
├── data/
│ ├── raw/ # Scraped images (by joint type)
│ ├── annotations/ # Roboflow exports (single or roboflow_all)
│ ├── yolo_dataset/ # Single-project YOLO dataset
│ └── yolo_dataset_merged/ # Merged multi-project dataset (9 classes)
│ ├── images/{train,val,test}/
│ ├── labels/{train,val,test}/
│ └── dataset.yaml
├── scraper/
│ ├── scrape_images.py # Web image collection
│ ├── validate_images.py # Image validation & cleanup
│ ├── download_roboflow.py # Roboflow download (single or --all-welding)
│ ├── merge_roboflow.py # Merge multiple Roboflow exports
│ ├── sources.py # Search queries & sources
│ └── config.json # Scraping configuration
├── src/
│ ├── train_yolo.py # YOLOv8 training
│ ├── train_joint_classifier.py # Optional 3-class joint classifier
│ ├── evaluate.py # Model evaluation
│ ├── inference.py # Detection API (WeldingJointDetector)
│ ├── prepare_dataset.py # Dataset preparation
│ └── annotation_helper.py # Annotation utilities
├── models/
│ └── runs/ # Training runs (e.g. weld_merged)
├── webapp/
│ ├── app.py # FastAPI app (detect + training-data browser)
│ ├── templates/
│ └── static/
├── docs/ # Documentation (see docs/README.md)
├── requirements.txt
├── Dockerfile
├── docker-compose.yml
├── .gitignore
└── README.md
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # Linux/Mac
# venv\Scripts\activate # Windows
# Install packages
pip install -r requirements.txtOption A: Use existing datasets (recommended first step)
See docs/EXISTING_DATASETS.md for a full list. Note: There is no large public dataset that labels the 5 joint types (Butt, T, Lap, Corner, Edge); most public sets are for defects or bead segmentation. The doc lists what exists and how to get more data.
Roboflow (YOLO-ready):
# List known welding datasets on Roboflow
python scraper/download_roboflow.py --list
# Download a dataset (get API key from https://app.roboflow.com/settings/api)
export ROBOFLOW_API_KEY=your_key
python scraper/download_roboflow.py --workspace college-izka9 --project welding-jointb --version 1 --output data/annotations/roboflow_export
python src/prepare_dataset.py --from-export data/annotations/roboflow_exportRoboflow – multiple welding projects (recommended for more data):
# Get your key from https://app.roboflow.com/settings/api (Account → Roboflow Keys)
# Then set it (replace YOUR_ACTUAL_KEY with the key from the dashboard):
export ROBOFLOW_API_KEY=YOUR_ACTUAL_KEY
python scraper/download_roboflow.py --all-weldingThis downloads three projects into data/annotations/roboflow_all/:
- welding_jointb (64 images, seam)
- welding_seam (65 images, seam)
- weld_quality (3.7k images, 6 classes: Good Welding, Bad Welding, Crack, Porosity, etc.)
Then merge them into one dataset and train:
python scraper/merge_roboflow.py --input data/annotations/roboflow_all --output data/yolo_dataset_merged
python src/train_yolo.py --data data/yolo_dataset_merged/dataset.yamlNote: The merge script writes an absolute path in dataset.yaml. After cloning on another machine, edit data/yolo_dataset_merged/dataset.yaml and set path to a relative path (e.g. data/yolo_dataset_merged) or the project root so training works from any checkout.
Other public datasets (RIAWELC, Mendeley, IEEE WELD):
python scraper/download_public_datasets.py --instructions # Print download links
python scraper/download_public_datasets.py --riawelc # Clone RIAWELC (24k X-ray images)Option B: Scrape from the web
# Download images for all joint types
python scraper/scrape_images.py
# Download for a specific joint type
python scraper/scrape_images.py --joint t_joint
# Limit downloads per query
python scraper/scrape_images.py --limit 30
# Validate downloaded images
python scraper/scrape_images.py --validate-only
# Check dataset summary
python scraper/scrape_images.py --summaryImages are saved to data/raw/<joint_type>/.
You need to draw bounding boxes around welding joints in each image.
Option A: Label Studio (Recommended)
pip install label-studio
label-studio start
# See setup guide:
python src/annotation_helper.py guideOption B: Roboflow (Cloud-based)
- Go to roboflow.com and create a free account
- Upload images from
data/raw/ - Annotate with bounding boxes using 5 classes
- Export in YOLOv8 format
- Place exported files in
data/annotations/
Option C: CVAT
Use CVAT for local annotation, then export in YOLO format.
# From manually organized annotations
python src/prepare_dataset.py
# From annotation tool export
python src/prepare_dataset.py --from-export data/annotations/export
# Custom split ratios
python src/prepare_dataset.py --train 0.8 --val 0.15 --test 0.05
# Check annotation statistics
python src/annotation_helper.py stats data/annotations# Default training (YOLOv8m, 100 epochs)
python src/train_yolo.py
# Custom settings
python src/train_yolo.py --model m --epochs 100 --batch 16 --device 0
# Smaller model for faster training
python src/train_yolo.py --model s --epochs 50
# CPU training (slower)
python src/train_yolo.py --model s --device cpu
# Resume interrupted training
python src/train_yolo.py --resume models/runs/welding_joint_detector/weights/last.ptModel sizes:
| Size | Params | Speed | Accuracy | GPU Memory |
|---|---|---|---|---|
| n | 3.2M | Fastest | Lower | ~4GB |
| s | 11.2M | Fast | Good | ~6GB |
| m | 25.9M | Medium | Better | ~8GB |
| l | 43.7M | Slower | High | ~10GB |
| x | 68.2M | Slowest | Highest | ~12GB |
# Evaluate on test set
python src/evaluate.py --model models/runs/welding_joint_detector/weights/best.pt
# Evaluate with visualizations
python src/evaluate.py --model best.pt --visualize data/yolo_dataset/images/test/
# Custom confidence threshold
python src/evaluate.py --model best.pt --conf 0.5# Start web server
python webapp/app.py --model models/runs/welding_joint_detector/weights/best.pt
# Custom host/port
python webapp/app.py --model best.pt --host 0.0.0.0 --port 8080
# Development mode with auto-reload
python webapp/app.py --model best.pt --reloadOpen http://localhost:8000 in your browser.
# Single image
python src/inference.py --model best.pt --source image.jpg
# Directory of images
python src/inference.py --model best.pt --source images_folder/
# Save results as JSON
python src/inference.py --model best.pt --source image.jpg --save-jsonDetect welding joints in an uploaded image.
Parameters:
file(form-data): Image file (JPEG, PNG, BMP, WebP)confidence(query, optional): Confidence threshold (0.0-1.0, default: 0.25)iou(query, optional): IoU threshold for NMS (0.0-1.0, default: 0.45)
Response:
{
"success": true,
"num_detections": 2,
"detections": [
{
"class_id": 1,
"class_name": "T-Joint",
"confidence": 0.92,
"bbox": [120.5, 80.3, 450.2, 310.7],
"bbox_normalized": [0.1875, 0.1672, 0.7034, 0.6473]
}
],
"annotated_image": "data:image/jpeg;base64,...",
"inference_time_ms": 45.2
}Health check endpoint.
List all detectable joint types.
Visit http://localhost:8000/docs for Swagger UI documentation.
# Build and run
docker-compose up --build
# Or without compose
docker build -t welding-detector .
docker run -p 8000:8000 -v ./models:/app/models welding-detectorfrom src.inference import WeldingJointDetector
# Initialize detector
detector = WeldingJointDetector("models/runs/welding_joint_detector/weights/best.pt")
# Detect joints
detections = detector.detect("image.jpg", conf_threshold=0.5)
for det in detections:
print(f"{det['class_name']}: {det['confidence']:.0%}")
# Detect and visualize
annotated_img, detections = detector.detect_and_visualize("image.jpg")
# Save annotated image
import cv2
cv2.imwrite("result.jpg", annotated_img)| Metric | Minimum | Good | Excellent |
|---|---|---|---|
| mAP50 | >0.70 | >0.85 | >0.90 |
| mAP50-95 | >0.50 | >0.65 | >0.75 |
| Precision | >0.80 | >0.85 | >0.90 |
| Recall | >0.75 | >0.80 | >0.85 |
| Inference | <500ms | <200ms | <100ms |
No GPU detected:
# Check CUDA availability
python -c "import torch; print(torch.cuda.is_available())"
# Train on CPU (slower)
python src/train_yolo.py --device cpuOut of memory during training:
- Reduce batch size:
--batch 8or--batch 4 - Use a smaller model:
--model sor--model n - Reduce image size:
--img-size 416
Low accuracy:
- Collect more training images (target: 200+ per class)
- Review annotation quality
- Try a larger model
- Increase training epochs
- Ensure dataset is balanced
Web app shows "No Model":
- Train a model first
- Specify model path:
python webapp/app.py --model path/to/best.pt - Or set environment variable:
export MODEL_PATH=path/to/best.pt
| Document | Description |
|---|---|
| docs/PROJECT_SUMMARY.md | Full summary of what was built — data collection (scraping, Roboflow download/merge), training (YOLO + optional joint classifier), web app, inference. Use for onboarding or before uploading to GitHub. |
| docs/README.md | Index of all documentation. |
| docs/EXISTING_DATASETS.md | Public welding datasets and download options. |
| docs/NEXT_STEPS.md | What to do after training (evaluate, run app, optional improvements). |
- Do not commit secrets. Use a
.envfile or environment variables forROBOFLOW_API_KEY;.gitignorealready excludes.env. - Large assets (models, data) are not in the repo. They are listed in
.gitignoreand synced via Google Drive. See docs/DRIVE_SYNC.md for:- What to upload to Drive (e.g.
models/runs/,data/yolo_dataset_merged/) usingscripts/drive_upload.py - How to download after clone with
scripts/drive_download.pyor by settingGDRIVE_FOLDER_IDwhen starting the app
- What to upload to Drive (e.g.
- Full project narrative: See docs/PROJECT_SUMMARY.md for everything that was implemented and how to run it after clone.
This project is for educational and research purposes.