
DEIMKit is a Python wrapper for DEIM: DETR with Improved Matching for Fast Convergence. Check out the original repo for more details.
π€ Why DEIMKit? | π Key Features | π¦ Installation | π Usage |
π‘ Inference | ποΈ Training | πΎ Export |
|
DEIMKit provides practical toolkits for using the DEIM object detector in practical applications. The original DEIM repo provides a great implementation of the DEIM object detector and DEIMKit adds some useful features for training, inference, exporting and deploying the model.
- Pure Python Configuration - No complicated YAML files, just clean Python code
- Cross-Platform Simplicity - Single command installation on Linux, macOS, and Windows
- Intuitive API - Load, train, predict, export in just a few lines of code
- π‘ Inference
- Single Image & Batch Prediction
- Load Pretrained & Custom Models
- Built-in Result Visualization
- Live ONNX Inference (Webcam, Video, Image)
- ποΈ Training
- Single & Multi-GPU Training
- Custom Dataset Support (COCO Format)
- Flexible Configuration via Pure Python
- πΎ Export
- Export Trained Models to ONNX
- ONNX Model with Integrated Preprocessing
- π οΈ Utilities & Demos
- Extensive metric logging for debugging
- Cross-Platform Support (Linux, macOS, Windows)
- Pixi Environment Management Integration
- Interactive Gradio Demo Script
If you're installing using pip, install torch and torchvision as a pre-requisite.
Next, install the package. Bleeding edge version
pip install git+https://github.com/dnth/DEIM.git
Stable version
pip install git+https://github.com/dnth/DEIM.git@v0.2.1
Tip
I recommend using Pixi to run this package. Pixi makes it easy to install the right version of Python and the dependencies to run this package on any platform!
Install pixi if you're on Linux or MacOS.
curl -fsSL https://pixi.sh/install.sh | bash
For Windows, you can use the following command.
powershell -ExecutionPolicy ByPass -c "irm -useb https://pixi.sh/install.ps1 | iex"
Navigate into the base directory of this repo and run
git clone https://github.com/dnth/DEIMKit.git
cd DEIMKit
pixi run quickstart
This will download a toy dataset with 8 images, and train a model on it for 3 epochs and runs inference on it. It shouldn't take more than 1 minute to complete.
If this runs without any issues, you've got a working Python environment with all the dependencies installed. This also installs DEIMKit in editable mode for development. See the pixi cheatsheet below for more.
List models supported by DEIMKit
from deimkit import list_models
list_models()
['deim_hgnetv2_n',
'deim_hgnetv2_s',
'deim_hgnetv2_m',
'deim_hgnetv2_l',
'deim_hgnetv2_x']
Load a pretrained model by the original authors
from deimkit import load_model
coco_classes = ["aeroplane", ... "zebra"]
model = load_model("deim_hgnetv2_x", class_names=coco_classes)
Load a custom trained model
model = load_model(
"deim_hgnetv2_s",
checkpoint="deim_hgnetv2_s_coco_cells/best.pth",
class_names=["cell", "platelet", "red_blood_cell", "white_blood_cell"],
image_size=(320 , 320)
)
Run inference on an image
result = model.predict(image_path, visualize=True)
Access the visualization
result.visualization
You can also run batch inference
results = model.predict_batch(image_paths, visualize=True, batch_size=8)
Here are some sample results I got by training on customs datasets.
See the demo notebook on using pretrained models and custom model inference for more details.
DEIMKit provides a simple interface for training your own models.
To start, configure the dataset. Specify the model, the dataset path, batch size, etc.
from deimkit import Trainer, Config, configure_dataset, configure_model
conf = Config.from_model_name("deim_hgnetv2_s")
# Optional
conf = configure_model(
config=conf,
num_queries=100, # Optional, default is 300
pretrained=True, # Optional, default is True
freeze_at=-1 # Optional, default is -1 (no freezing)
)
# Required
conf = configure_dataset(
config=conf,
image_size=(640, 640),
train_ann_file="dataset/PCB Holes.v4i.coco/train/_annotations.coco.json",
train_img_folder="dataset/PCB Holes.v4i.coco/train",
val_ann_file="dataset/PCB Holes.v4i.coco/valid/_annotations.coco.json",
val_img_folder="dataset/PCB Holes.v4i.coco/valid",
train_batch_size=16,
val_batch_size=16,
num_classes=2,
output_dir="./outputs/deim_hgnetv2_s_pcb",
)
trainer = Trainer(conf)
# Optional - Load from a previously trained checkpoint
trainer.load_checkpoint("previous_best.pth")
# All arguments are optional, if not specified, the default values for the model will be used.
trainer.fit(
epochs=100, # Number of training epochs
save_best_only=True, # Save only the best model checkpoint
lr=0.0001, # Learning rate
lr_gamma=0.1, # Learning rate annealing factor
weight_decay=0.0001, # Weight decay
)
To run multigpu training (4 GPU for example), place your code into a .py
file, e.g. train.py
and use the following command.
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=7778 --nproc_per_node=4 train.py
Modify the number of GPUs available to your system.
Caution
Your dataset should be in COCO format. The class index should start from 0. Refer to the structure of a sample dataset exported from Roboflow. From my tests this works for DEIMKit.
The num_classes
should be the number of classes in your dataset + 1 for the background class.
Monitor training progress
tensorboard --logdir ./outputs/deim_hgnetv2_s_pcb
Point --logdir
to the output_dir
directory.
Navigate to the http://localhost:6006/ in your browser to view the training progress.
Currently, the export function is only used for exporting the model to ONNX and run it using ONNXRuntime (see Live Inference for more details). I think one could get pretty far with this even on a low resource machine. Drop an issue if you think this should be extended to other formats.
from deimkit.exporter import Exporter
from deimkit.config import Config
config = Config("config.yml")
exporter = Exporter(config)
output_path = exporter.to_onnx(
checkpoint_path="model.pth",
output_path="model.onnx"
)
Note
The exported model will accept raw BGR images of any size. It will also handle the preprocessing internally. Credit to PINTO0309 for the implementation.
Tip
If you want to export to OpenVINO you can do so directly from the ONNX model.
import onnx
from onnx import helper
model = onnx.load("best.onnx")
# Change the mode attribute of the GridSample node to bilinear as this operation is not supported in OpenVINO
for node in model.graph.node:
if node.op_type == 'GridSample':
for i, attr in enumerate(node.attribute):
if attr.name == 'mode' and attr.s == b'linear':
# Replace 'linear' with 'bilinear'
node.attribute[i].s = b'bilinear'
# Save the modified model
onnx.save(model, "best_prep_openvino.onnx")
You can then use the live inference script to run inference on the OpenVINO model.
Run a Gradio app to interact with your model. The app will accept raw BGR images of any size. It will also handle the preprocessing internally using the exported ONNX model.
python scripts/gradio_demo.py \
--model "best.onnx" \
--classes "classes.txt" \
--examples "Rock Paper Scissors SXSW.v14i.coco/test"
gradio-demo.mp4
Note
The demo app uses onnx model and onnxruntime for inference. Additionally, I have also made it that the ONNX model to accept any input size, despite the original model was trained on 640x640 images. This means you can use any image size you want. Play around with the input size slider to see what works best for your model. Some objects are visible even at lower input sizes, this means you can use a lower input size to speed up inference.
Tip
Live inference is provided as an independent script so that you can load this script into any deployment devices without having to install all the (large) dependencies in this repo. The script only requires onnxruntime
, cv2
and numpy
to run.
There are two live inference scripts:
live_inference.py
- Run live inference on a video, image or webcam using ONNXRuntime. This script requires a pre-exported ONNX model.live_inference_pretrained.py
- Automatically download the pretrained model, convert to ONNX, and run inference in one go.
Run live inference on a video, image or webcam using ONNXRuntime. This runs on CPU by default.
If you would like to use the CUDA backend, install the onnxruntime-gpu
package and uninstall the onnxruntime
package.
For running inference on a webcam, set the --webcam
flag.
python scripts/live_inference.py
--model model.onnx # Path to the ONNX model file
--webcam # Use webcam as input source
--classes classes.txt # Path to the classes file with each name on a new row
--inference-size 720 # Input size for the model
--provider tensorrt # Execution provider (cpu/cuda/tensorrt)
--threshold 0.3 # Detection confidence threshold
Because we are handling the preprocessing internally in the ONNX model, the input size is not limited to the original 640x640. You can use any input size you want for inference. The model was trained on 640x640 images. Integrating the preprocessing internally in the ONNX model also lets us run inference at very high FPS as it uses more efficient onnx operators.
The following is a model I trained on a custom dataset using the deim_hgnetv2_s model and exported to ONNX. Here are some examples of inference on a webcam at different video resolutions.
Webcam video width at 1920x1080 pixels (1080p):
1920.mp4
Webcam video width at 1280x720 pixels (720p):
1280.mp4
Webcam video width at 848x480 pixels (480p):
848.mp4
Webcam video width at 640x480 pixels (480p):
640.mp4
Webcam video width at 320x240 pixels (240p):
320.mp4
To run live inference with pretrained model on webcam, use the following command
python scripts/live_inference_pretrained.py --webcam
This downloads the pretrained model, converts it to ONNX, and runs inference on the webcam. All other arguments are optional but they are similar to the live_inference.py
script.
The output is as follows
deimkit-pretrained-inference.mp4
For video inference, specify the path to the video file as the input. Output video will be saved as onnx_result.mp4
in the current directory.
python scripts/live_inference.py
--model model.onnx # Path to the ONNX model file
--video video.mp4 # Path to the input video file
--classes classes.txt # Path to the classes file with each name on a new row
--inference-size 320 # Input size for the model (renamed from --video-width)
--provider cpu # Execution provider (cpu/cuda/tensorrt)
--threshold 0.3 # Detection confidence threshold
video-inf.mp4
The following is an inference using the pre-trained model deim_hgnetv2_x
trained on COCO.
python scripts/live_inference_pretrained.py --model deim_hgnetv2_x --video video.mp4
deimkit.inference.mp4
For image inference, specify the path to the image file as the input.
python scripts/live_inference.py
--model model.onnx # Path to the ONNX model file
--image image.jpg # Path to the input image file
--classes classes.txt # Path to the classes file with each name on a new row
--provider cpu # Execution provider (cpu/cuda/tensorrt)
--threshold 0.3 # Detection confidence threshold
The following is a demo of image inference
Tip
If you are using Pixi, you can run the live inference script with the following command with the same arguments as above.
pixi run --environment cuda live-inference
--onnx model.onnx
--webcam
--class-names classes.txt
--inference-size 320
Under the hood, this automatically pull in the onnxruntime-gpu
package into the cuda
environment and use the GPU for inference!
If you want to use the CPU, replace cuda
with cpu
in the command above.
Here are some useful tasks you can run with Pixi. You must install pixi on your machine first. See the installation section for more details.
Note
For all commands below, you can add -e cuda
to run in a CUDA-enabled environment instead of CPU.
# Check environment setup (CPU)
pixi run quickstart
# Check environment setup (CUDA)
pixi run -e cuda quickstart
# Live inference with pretrained model (webcam)
pixi run -e cuda live-inference-pretrained --webcam
# Live inference with custom ONNX model (webcam)
pixi run -e cuda live-inference \
--onnx model.onnx \
--webcam \
--provider cuda \
--class-names classes.txt \
--inference-size 640
# Video inference (CPU)
pixi run -e cpu live-inference \
--onnx model.onnx \
--input video.mp4 \
--class-names classes.txt \
--inference-size 320
# Launch Gradio demo with examples
pixi run gradio-demo \
--model "best_prep.onnx" \
--classes "classes.txt" \
--examples "Rock Paper Scissors SXSW.v14i.coco/test"
# Launch Gradio demo (CPU only)
pixi run -e cpu gradio-demo
# Train model (CUDA)
pixi run -e cuda train-model
# Train model (CPU)
pixi run -e cpu train-model
# Export model to ONNX
pixi run export \
--config config.yml \
--checkpoint model.pth \
--output model.onnx
Tip
For TensorRT inference, set the LD_LIBRARY_PATH
environment variable:
export LD_LIBRARY_PATH=".pixi/envs/cuda/lib/python3.11/site-packages/tensorrt_libs:$LD_LIBRARY_PATH"
I'm not affiliated with the original DEIM authors. I just found the model interesting and wanted to try it out. The changes made here are of my own. Please cite and star the original repo if you find this useful.