POST: Panoptic Segmentation and Tracking

Architecture

Abstract

The long-standing goal of computer vision is to gain a high-level understanding of digital images and videos, similar to how the human brain is able to perceive objects, movement and depth in it’s visual field. Recently advances in Convolutional Neural Networks have pushed computer vision to the limit. By dividing the task of human-like visual perception into bite-sized challenges such as classification, segmentation and tracking and establishing benchmarks significant progress was made, even outperforming humans in certain areas. However, there is still a long way to go in building a unified model, that performs all these tasks at once and is also capable of working robustly in a real world scenario and not only on a given dataset. The proposed robust one-stage segmentation and tracking model aims to further this quest, by unifying the tasks of panoptic segmentation and tracking. Furthermore, our goal for this model is not to be bound to any specific benchmark dataset, but to provide robustness on real world examples. We accomplish this goal by extending Panoptic-DeepLab [Che+20] by an previous offset branch, to enable it to track objects in a video. Furthermore, we train this model on multiple datasets simultaneously without setting hyperparameters for any specific dataset.

Installation

Install Detectron2 following the instructions.

Training

To train a model with run:

cd /path/to/detectron2/projects/Post
python train_net.py --config-file configs/KITTI-MOTS/post_R_52_os16_mg124_poly_200k_bs1_kitti_mots_crop_384_dsconv.yaml

Inference

Model evaluation can be done similarly:

cd /path/to/detectron2/projects/Post
python train_net.py --config-file configs/KITTI-MOTS/panoptic_deeplab_R_52_os16_mg124_poly_200k_bs64_crop_640_640_kitti_mots_dsconv.yaml --inference-only MODEL.WEIGHTS models/model_final_5e6da2.pkl INPUT.CROP.ENABLED False

Benchmark network speed

If you want to benchmark the network speed without post-processing, you can run the evaluation script with MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True:

cd /path/to/detectron2/projects/Poar
python train_net.py --config-file configs/KITTI-MOTS/panoptic_deeplab_R_52_os16_mg124_poly_200k_bs64_crop_640_640_kitti_mots_dsconv.yaml --eval-only MODEL.WEIGHTS /path/to/model_checkpoint MODEL.PANOPTIC_DEEPLAB.BENCHMARK_NETWORK_SPEED True

Cityscapes Panoptic Segmentation

Cityscapes models are trained with ImageNet pretraining.

Method	Backbone	Output resolution	PQ	SQ	RQ	mIoU	AP	Memory (M)	model id	download
Panoptic-DeepLab	R50-DC5	1024×2048	58.6	80.9	71.2	75.9	29.8	8668	-	model \| metrics
Panoptic-DeepLab	R52-DC5	1024×2048	60.3	81.5	72.9	78.2	33.2	9682	30841561	model \| metrics
Panoptic-DeepLab (DSConv)	R52-DC5	1024×2048	60.3	81.0	73.2	78.7	32.1	10466	33148034	model \| metrics

Note:

R52: a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of pytorch examples.
DC5 means using dilated convolution in res5.
We use a smaller training crop size (512x1024) than the original paper (1025x2049), we find using larger crop size (1024x2048) could further improve PQ by 1.5% but also degrades AP by 3%.
The implementation with regular Conv2d in ASPP and head is much heavier head than the original paper.
This implementation does not include optimized post-processing code needed for deployment. Post-processing the network outputs now takes similar amount of time to the network itself. Please refer to speed in the original paper for comparison.
DSConv refers to using DepthwiseSeparableConv2d in ASPP and decoder. The implementation with DSConv is identical to the original paper.

COCO Panoptic Segmentation

COCO models are trained with ImageNet pretraining on 16 V100s.

Method	Backbone	Output resolution	PQ	SQ	RQ	Box AP	Mask AP	Memory (M)	model id	download
Panoptic-DeepLab (DSConv)	R52-DC5	640×640	35.5	77.3	44.7	18.6	19.7		246448865	model \| metrics

Note:

R52: a ResNet-50 with its first 7x7 convolution replaced by 3 3x3 convolutions. This modification has been used in most semantic segmentation papers. We pre-train this backbone on ImageNet using the default recipe of pytorch examples.
DC5 means using dilated convolution in res5.
This reproduced number matches the original paper (35.5 vs. 35.1 PQ).
This implementation does not include optimized post-processing code needed for deployment. Post-processing the network

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up rewrite of the previous version, Detectron, and it originates from maskrcnn-benchmark.

What's New

It is powered by the PyTorch deep learning framework.
Includes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc.
Can be used as a library to support different projects on top of it. We'll open source more research projects in this way.
It trains much faster.
Models can be exported to TorchScript format or Caffe2 format for deployment.

See our blog post to see more demos and learn about detectron2.

Installation

See INSTALL.md.

Getting Started

Follow the installation instructions to install detectron2.

See Getting Started with Detectron2, and the Colab Notebook to learn about basic usage.

Learn more at our documentation. And see projects/ for some projects that are built on top of detectron2.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Detectron2 Model Zoo.

License

Detectron2 is released under the Apache 2.0 license.

Citing Detectron2

If you use Detectron2 in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{wu2019detectron2,
  author =       {Yuxin Wu and Alexander Kirillov and Francisco Massa and
                  Wan-Yen Lo and Ross Girshick},
  title =        {Detectron2},
  howpublished = {\url{https://github.com/facebookresearch/detectron2}},
  year =         {2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,035 Commits
.circleci		.circleci
.github		.github
configs		configs
datasets		datasets
demo		demo
detectron2		detectron2
dev		dev
docker		docker
docs		docs
projects		projects
tests		tests
tools		tools
.clang-format		.clang-format
.flake8		.flake8
.gitignore		.gitignore
GETTING_STARTED.md		GETTING_STARTED.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
MODEL_ZOO.md		MODEL_ZOO.md
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POST: Panoptic Segmentation and Tracking

Architecture

Abstract

Installation

Training

Inference

Benchmark network speed

Cityscapes Panoptic Segmentation

COCO Panoptic Segmentation

What's New

Installation

Getting Started

Model Zoo and Baselines

License

Citing Detectron2

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rodrigo-suarezmajor/post

Folders and files

Latest commit

History

Repository files navigation

POST: Panoptic Segmentation and Tracking

Architecture

Abstract

Installation

Training

Inference

Benchmark network speed

Cityscapes Panoptic Segmentation

COCO Panoptic Segmentation

What's New

Installation

Getting Started

Model Zoo and Baselines

License

Citing Detectron2

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages