|
# Distill RTM Detectors Based on MMRazor |
|
|
|
## Description |
|
|
|
To further improve the model accuracy while not introducing much additional |
|
computation cost, we apply the feature-based distillation to the training phase |
|
of these RTM detectors. In summary, our distillation strategy are threefold: |
|
|
|
(1) Inspired by [PKD](https://arxiv.org/abs/2207.02039), we first normalize |
|
the intermediate feature maps to have zero mean and unit variances before calculating |
|
the distillation loss. |
|
|
|
(2) Inspired by [CWD](https://arxiv.org/abs/2011.13256), we adopt the channel-wise |
|
distillation paradigm, which can pay more attention to the most salient regions |
|
of each channel. |
|
|
|
(3) Inspired by [DAMO-YOLO](https://arxiv.org/abs/2211.15444), the distillation |
|
process is split into two stages. 1) The teacher distills the student at the |
|
first stage (280 epochs) on strong mosaic domain. 2) The student finetunes itself |
|
on no masaic domain at the second stage (20 epochs). |
|
|
|
## Results and Models |
|
|
|
| Location | Dataset | Teacher | Student | mAP | mAP(T) | mAP(S) | Config | Download | |
|
| :------: | :-----: | :---------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------: | :---------: | :----: | :----: | :------------------------------------------: | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
|
| FPN | COCO | [RTMDet-s](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py) | [RTMDet-tiny](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_tiny_syncbn_fast_8xb32-300e_coco.py) | 41.8 (+0.8) | 44.6 | 41.0 | [config](kd_tiny_rtmdet_s_neck_300e_coco.py) | [teacher](https://download.openmmlab.com/mmyolo/v0/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco/rtmdet_s_syncbn_fast_8xb32-300e_coco_20221230_182329-0a8c901a.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_tiny_rtmdet_s_neck_300e_coco/kd_tiny_rtmdet_s_neck_300e_coco_20230213_104240-e1e4197c.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_tiny_rtmdet_s_neck_300e_coco/kd_tiny_rtmdet_s_neck_300e_coco_20230213_104240-176901d8.json) | |
|
| FPN | COCO | [RTMDet-m](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_m_syncbn_fast_8xb32-300e_coco.py) | [RTMDet-s](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_s_syncbn_fast_8xb32-300e_coco.py) | 45.7 (+1.1) | 49.3 | 44.6 | [config](kd_s_rtmdet_m_neck_300e_coco.py) | [teacher](https://download.openmmlab.com/mmyolo/v0/rtmdet/rtmdet_m_syncbn_fast_8xb32-300e_coco/rtmdet_m_syncbn_fast_8xb32-300e_coco_20230102_135952-40af4fe8.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_s_rtmdet_m_neck_300e_coco/kd_s_rtmdet_m_neck_300e_coco_20230220_140647-446ff003.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_s_rtmdet_m_neck_300e_coco/kd_s_rtmdet_m_neck_300e_coco_20230220_140647-89862269.json) | |
|
| FPN | COCO | [RTMDet-l](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_l_syncbn_fast_8xb32-300e_coco.py) | [RTMDet-m](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_m_syncbn_fast_8xb32-300e_coco.py) | 50.2 (+0.9) | 51.4 | 49.3 | [config](kd_m_rtmdet_l_neck_300e_coco.py) | [teacher](https://download.openmmlab.com/mmyolo/v0/rtmdet/rtmdet_l_syncbn_fast_8xb32-300e_coco/rtmdet_l_syncbn_fast_8xb32-300e_coco_20230102_135928-ee3abdc4.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_m_rtmdet_l_neck_300e_coco/kd_m_rtmdet_l_neck_300e_coco_20230220_141313-b806f503.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_m_rtmdet_l_neck_300e_coco/kd_m_rtmdet_l_neck_300e_coco_20230220_141313-bd028fd3.json) | |
|
| FPN | COCO | [RTMDet-x](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_x_syncbn_fast_8xb32-300e_coco.py) | [RTMDet-l](https://github.com/open-mmlab/mmyolo/blob/main/configs/rtmdet/rtmdet_l_syncbn_fast_8xb32-300e_coco.py) | 52.3 (+0.9) | 52.8 | 51.4 | [config](kd_l_rtmdet_x_neck_300e_coco.py) | [teacher](https://download.openmmlab.com/mmyolo/v0/rtmdet/rtmdet_x_syncbn_fast_8xb32-300e_coco/rtmdet_x_syncbn_fast_8xb32-300e_coco_20221231_100345-b85cd476.pth) \|[model](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_l_rtmdet_x_neck_300e_coco/kd_l_rtmdet_x_neck_300e_coco_20230220_141912-c9979722.pth) \| [log](https://download.openmmlab.com/mmrazor/v1/rtmdet_distillation/kd_l_rtmdet_x_neck_300e_coco/kd_l_rtmdet_x_neck_300e_coco_20230220_141912-c5c4e17b.json) | |
|
|
|
## Usage |
|
|
|
### Prerequisites |
|
|
|
- [MMRazor dev-1.x](https://github.com/open-mmlab/mmrazor/tree/dev-1.x) |
|
|
|
Install MMRazor from source |
|
|
|
``` |
|
git clone -b dev-1.x https://github.com/open-mmlab/mmrazor.git |
|
cd mmrazor |
|
# Install MMRazor |
|
mim install -v -e . |
|
``` |
|
|
|
### Training commands |
|
|
|
In MMYOLO's root directory, run the following command to train the RTMDet-tiny |
|
with 8 GPUs, using RTMDet-s as the teacher: |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PORT=29500 ./tools/dist_train.sh configs/rtmdet/distillation/kd_tiny_rtmdet_s_neck_300e_coco.py |
|
``` |
|
|
|
### Testing commands |
|
|
|
In MMYOLO's root directory, run the following command to test the model: |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0 PORT=29500 ./tools/dist_test.sh configs/rtmdet/distillation/kd_tiny_rtmdet_s_neck_300e_coco.py ${CHECKPOINT_PATH} |
|
``` |
|
|
|
### Getting student-only checkpoint |
|
|
|
After training, the checkpoint contains parameters for both student and teacher models. |
|
Run the following command to convert it to student-only checkpoint: |
|
|
|
```bash |
|
python ./tools/model_converters/convert_kd_ckpt_to_student.py ${CHECKPOINT_PATH} --out-path ${OUTPUT_CHECKPOINT_PATH} |
|
``` |
|
|
|
## Configs |
|
|
|
Here we provide detection configs and models for MMRazor in MMYOLO. For clarify, |
|
we take `./kd_tiny_rtmdet_s_neck_300e_coco.py` as an example to show how to |
|
distill a RTM detector based on MMRazor. |
|
|
|
Here is the main part of `./kd_tiny_rtmdet_s_neck_300e_coco.py`. |
|
|
|
```shell |
|
norm_cfg = dict(type='BN', affine=False, track_running_stats=False) |
|
|
|
distiller=dict( |
|
type='ConfigurableDistiller', |
|
student_recorders=dict( |
|
fpn0=dict(type='ModuleOutputs', source='neck.out_layers.0.conv'), |
|
fpn1=dict(type='ModuleOutputs', source='neck.out_layers.1.conv'), |
|
fpn2=dict(type='ModuleOutputs', source='neck.out_layers.2.conv'), |
|
), |
|
teacher_recorders=dict( |
|
fpn0=dict(type='ModuleOutputs', source='neck.out_layers.0.conv'), |
|
fpn1=dict(type='ModuleOutputs', source='neck.out_layers.1.conv'), |
|
fpn2=dict(type='ModuleOutputs', source='neck.out_layers.2.conv')), |
|
connectors=dict( |
|
fpn0_s=dict(type='ConvModuleConnector', in_channel=96, |
|
out_channel=128, bias=False, norm_cfg=norm_cfg, |
|
act_cfg=None), |
|
fpn0_t=dict( |
|
type='NormConnector', in_channels=128, norm_cfg=norm_cfg), |
|
fpn1_s=dict( |
|
type='ConvModuleConnector', in_channel=96, |
|
out_channel=128, bias=False, norm_cfg=norm_cfg, |
|
act_cfg=None), |
|
fpn1_t=dict( |
|
type='NormConnector', in_channels=128, norm_cfg=norm_cfg), |
|
fpn2_s=dict( |
|
type='ConvModuleConnector', in_channel=96, |
|
out_channel=128, bias=False, norm_cfg=norm_cfg, |
|
act_cfg=None), |
|
fpn2_t=dict( |
|
type='NormConnector', in_channels=128, norm_cfg=norm_cfg)), |
|
distill_losses=dict( |
|
loss_fpn0=dict(type='ChannelWiseDivergence', loss_weight=1), |
|
loss_fpn1=dict(type='ChannelWiseDivergence', loss_weight=1), |
|
loss_fpn2=dict(type='ChannelWiseDivergence', loss_weight=1)), |
|
loss_forward_mappings=dict( |
|
loss_fpn0=dict( |
|
preds_S=dict(from_student=True, recorder='fpn0', connector='fpn0_s'), |
|
preds_T=dict(from_student=False, recorder='fpn0', connector='fpn0_t')), |
|
loss_fpn1=dict( |
|
preds_S=dict(from_student=True, recorder='fpn1', connector='fpn1_s'), |
|
preds_T=dict(from_student=False, recorder='fpn1', connector='fpn1_t')), |
|
loss_fpn2=dict( |
|
preds_S=dict(from_student=True, recorder='fpn2', connector='fpn2_s'), |
|
preds_T=dict(from_student=False, recorder='fpn2', connector='fpn2_t')))) |
|
|
|
``` |
|
|
|
`recorders` are used to record various intermediate results during the model forward. |
|
In this example, they can help record the output of 3 `nn.Module` of the teacher |
|
and the student. Details are list in [Recorder](https://github.com/open-mmlab/mmrazor/blob/dev-1.x/docs/en/advanced_guides/recorder.md) and [MMRazor Distillation](https://zhuanlan.zhihu.com/p/596582609) (if users can read Chinese). |
|
|
|
`connectors` are adaptive layers which usually map teacher's and students features |
|
to the same dimension. |
|
|
|
`distill_losses` are configs for multiple distill losses. |
|
|
|
`loss_forward_mappings` are mappings between distill loss forward arguments and records. |
|
|
|
In addition, the student finetunes itself on no masaic domain at the last 20 epochs, |
|
so we add a new hook named `StopDistillHook` to stop distillation on time. |
|
We need to add this hook to the `custom_hooks` list like this: |
|
|
|
```shell |
|
custom_hooks = [..., dict(type='mmrazor.StopDistillHook', detach_epoch=280)] |
|
``` |
|
|