Skip to content

rodrigo-suarezmajor/post

POST: Panoptic Segmentation and Tracking

Architecture

architecture

Abstract

The long-standing goal of computer vision is to gain a high-level understanding of digital images and videos, similar to how the human brain is able to perceive objects, movement and depth in it’s visual field. Recently advances in Convolutional Neural Networks have pushed computer vision to the limit. By dividing the task of human-like visual perception into bite-sized challenges such as classification, segmentation and tracking and establishing benchmarks significant progress was made, even outperforming humans in certain areas. However, there is still a long way to go in building a unified model, that performs all these tasks at once and is also capable of working robustly in a real world scenario and not only on a given dataset. The proposed robust one-stage segmentation and tracking model aims to further this quest, by unifying the tasks of panoptic segmentation and tracking. Furthermore, our goal for this model is not to be bound to any specific benchmark dataset, but to provide robustness on real world examples. We accomplish this goal by extending Panoptic-DeepLab [Che+20] by an previous offset branch, to enable it to track objects in a video. Furthermore, we train this model on multiple datasets simultaneously without setting hyperparameters for any specific dataset.

Installation

Install Detectron2 following the instructions.

Training

To train a model with run:

cd /path/to/detectron2/projects/Post
python train_net.py --config-file configs/KITTI-MOTS/post_R_52_os16_mg124_poly_200k_bs1_kitti_mots_crop_384_dsconv.yaml

Inference

Model evaluation can be done similarly:

cd /path/to/detectron2/projects/Post
python train_net.py --config-file configs/KITTI-MOTS/panoptic_deeplab_R_52_os16_mg124_poly_200k_bs64_crop_640_640_kitti_mots_dsconv.yaml --inference-only MODEL.WEIGHTS models/model_final_5e6da2.pkl INPUT.CROP.ENABLED False

Benchmark network speed

If you want to benchmark the network speed without post-processing, you can run the evaluation script with MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True:

cd /path/to/detectron2/projects/Poar
python train_net.py --config-file configs/KITTI-MOTS/panoptic_deeplab_R_52_os16_mg124_poly_200k_bs64_crop_640_640_kitti_mots_dsconv.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True

Cityscapes Panoptic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Method Backbone Output
resolution
PQ SQ RQ mIoU AP Memory (M) model id download
Panoptic-DeepLab R50-DC5 1024×2048 58.6 80.9 71.2 75.9 29.8 8668 - model | metrics
Panoptic-DeepLab R52-DC5 1024×2048 60.3 81.5 72.9 78.2 33.2 9682 30841561 model | metrics
Panoptic-DeepLab (DSConv) R52-DC5 1024×2048 60.3 81.0 73.2 78.7 32.1 10466 33148034 model | metrics

Note:

  • R52: a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of pytorch examples.
  • DC5 means using dilated convolution in res5.
  • We use a smaller training crop size (512x1024) than the original paper (1025x2049), we find using larger crop size (1024x2048) could further improve PQ by 1.5% but also degrades AP by 3%.
  • The implementation with regular Conv2d in ASPP and head is much heavier head than the original paper.
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes similar amount of time to the network itself. Please refer to speed in the original paper for comparison.
  • DSConv refers to using DepthwiseSeparableConv2d in ASPP and decoder. The implementation with DSConv is identical to the original paper.

COCO Panoptic Segmentation

COCO models are trained with ImageNet pretraining on 16 V100s.

Method Backbone Output
resolution
PQ SQ RQ Box AP Mask AP Memory (M) model id download
Panoptic-DeepLab (DSConv) R52-DC5 640×640 35.5 77.3 44.7 18.6 19.7 246448865 model | metrics

Note:

  • R52: a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of pytorch examples.
  • DC5 means using dilated convolution in res5.
  • This reproduced number matches the original paper (35.5 vs. 35.1 PQ).
  • This implementation does not include optimized post-processing code needed for deployment. Post-processing the network

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

  • It is powered by the PyTorch deep learning framework.
  • Includes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc.
  • Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
  • It trains much faster.
  • Models can be exported to TorchScript format or Caffe2 format for deployment.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Getting Started

Follow the installation instructions to install detectron2.

See Getting Started with Detectron2, and the Colab Notebook to learn about basic usage.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron2

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

About

Panoptic One-shot Segmentation and Tracking

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published