Deployment

Models written in Python need to go through an export process to become a deployable artifact. A few basic concepts about this process:

"Export method" is how a Python model is fully serialized to a deployable format. We support the following export methods:

tracing: see pytorch documentation to learn about it
scripting: see pytorch documentation to learn about it
caffe2_tracing: replace parts of the model by caffe2 operators, then use tracing.

"Format" is how a serialized model is described in a file, e.g. TorchScript, Caffe2 protobuf, ONNX format. "Runtime" is an engine that loads a serialized model and executes it, e.g., PyTorch, Caffe2, TensorFlow, onnxruntime, TensorRT, etc. A runtime is often tied to a specific format (e.g. PyTorch needs TorchScript format, Caffe2 needs protobuf format). We currently support the following combination and each has some limitations:

+----------------------------+-------------+-------------+-----------------------------+
|       Export Method        |   tracing   |  scripting  |       caffe2_tracing        |
+============================+=============+=============+=============================+
| **Formats**                | TorchScript | TorchScript | Caffe2, TorchScript, ONNX   |
+----------------------------+-------------+-------------+-----------------------------+
| **Runtime**                | PyTorch     | PyTorch     | Caffe2, PyTorch             |
+----------------------------+-------------+-------------+-----------------------------+
| C++/Python inference       | ✅          | ✅          | ✅                          |
+----------------------------+-------------+-------------+-----------------------------+
| Dynamic resolution         | ✅          | ✅          | ✅                          |
+----------------------------+-------------+-------------+-----------------------------+
| Batch size requirement     | Constant    | Dynamic     | Batch inference unsupported |
+----------------------------+-------------+-------------+-----------------------------+
| Extra runtime deps         | torchvision | torchvision | Caffe2 ops (usually already |
|                            |             |             |                             |
|                            |             |             | included in PyTorch)        |
+----------------------------+-------------+-------------+-----------------------------+
| Faster/Mask/Keypoint R-CNN | ✅          | ✅          | ✅                          |
+----------------------------+-------------+-------------+-----------------------------+
| RetinaNet                  | ✅          | ✅          | ✅                          |
+----------------------------+-------------+-------------+-----------------------------+
| PointRend R-CNN            | ✅          | ❌          | ❌                          |
+----------------------------+-------------+-------------+-----------------------------+
| Cascade R-CNN              | ✅          | ❌          | ❌                          |
+----------------------------+-------------+-------------+-----------------------------+

caffe2_tracing is going to be deprecated. We don't plan to work on additional support for other formats/runtime, but contributions are welcome.

Deployment with Tracing or Scripting

Models can be exported to TorchScript format, by either tracing or scripting. The output model file can be loaded without detectron2 dependency in either Python or C++. The exported model often requires torchvision (or its C++ library) dependency for some custom ops.

This feature requires PyTorch ≥ 1.8.

Coverage

Most official models under the meta architectures GeneralizedRCNN and RetinaNet are supported in both tracing and scripting mode. Cascade R-CNN and PointRend are currently supported in tracing. Users' custom extensions are supported if they are also scriptable or traceable.

For models exported with tracing, dynamic input resolution is allowed, but batch size (number of input images) must be fixed. Scripting can support dynamic batch size.

Usage

The main export APIs for tracing and scripting are TracingAdapter and scripting_with_instances. Their usage is currently demonstrated in test_export_torchscript.py (see TestScripting and TestTracing) as well as the deployment example. Please check that these examples can run, and then modify for your use cases. The usage now requires some user effort and necessary knowledge for each model to workaround the limitation of scripting and tracing. In the future we plan to wrap these under simpler APIs to lower the bar to use them.

Deployment with Caffe2-tracing

We provide Caffe2Tracer that performs the export logic. It replaces parts of the model with Caffe2 operators, and then export the model into Caffe2, TorchScript or ONNX format.

The converted model is able to run in either Python or C++ without detectron2/torchvision dependency, on CPU or GPUs. It has a runtime optimized for CPU & mobile inference, but not optimized for GPU inference.

This feature requires ONNX ≥ 1.6.

Coverage

Most official models under these 3 common meta architectures: GeneralizedRCNN, RetinaNet, PanopticFPN are supported. Cascade R-CNN is not supported. Batch inference is not supported.

Users' custom extensions under these architectures (added through registration) are supported as long as they do not contain control flow or operators not available in Caffe2 (e.g. deformable convolution). For example, custom backbones and heads are often supported out of the box.

Usage

The APIs are listed at the API documentation. We provide export_model.py as an example that uses these APIs to convert a standard model. For custom models/datasets, you can add them to this script.

Use the model in C++/Python

The model can be loaded in C++ and deployed with either Caffe2 or Pytorch runtime.. C++ examples for Mask R-CNN are given as a reference. Note that:

Models exported with caffe2_tracing method take a special input format described in documentation. This was taken care of in the C++ example.
The converted models do not contain post-processing operations that transform raw layer outputs into formatted predictions. For example, the C++ examples only produce raw outputs (28x28 masks) from the final layers that are not post-processed, because in actual deployment, an application often needs its custom lightweight post-processing, so this step is left for users.

To help use the Caffe2-format model in python, we provide a python wrapper around the converted model, in the Caffe2Model.__call__ method. This method has an interface that's identical to the pytorch versions of models, and it internally applies pre/post-processing code to match the formats. This wrapper can serve as a reference for how to use Caffe2's python API, or for how to implement pre/post-processing in actual deployment.

Conversion to TensorFlow

tensorpack Faster R-CNN provides scripts to convert a few standard detectron2 R-CNN models to TensorFlow's pb format. It works by translating configs and weights, therefore only support a few models.