--- license: apache-2.0 language: - en --- # [Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation](https://yannqi.github.io/AVS-COMBO/) [Qi Yang](https://yannqi.github.io/), Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan and [Shiming Xiang](https://people.ucas.ac.cn/~xiangshiming) This repository provides the pretrained checkpoints for the paper "Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation" accepted by CVPR 2024. ## 🔥What's New - (2024. 3.14) Our checkpoints are available to the public! - (2024. 3.12) Our code is available to the public🌲! - (2024. 2.27) Our paper(COMBO) is accepted by CVPR 2024! - (2023.11.17) We completed the implemention of COMBO and push the code. ## 🛠️ Getting Started ### 1. Environments - Linux or macOS with Python ≥ 3.6 ```shell # recommended pip install -r requirements.txt pip install soundfile # build MSDeformAttention cd model/modeling/pixel_decoder/ops sh make.sh ``` - Preprocessing for detectron2 For using Siam-Encoder Module (SEM), we refine 1-line code of the detectron2. The refined file that requires attention is located at: `conda_envs/xxx/lib/python3.xx/site-packages/detectron2/checkpoint/c2_model_loading.py` (refine the `xxx` to your own environment) Commenting out the following code in [L287](https://github.com/facebookresearch/detectron2/blob/cc9266c2396d5545315e3601027ba4bc28e8c95b/detectron2/checkpoint/c2_model_loading.py#L287) will allow the code to run without errors: ```python # raise ValueError("Cannot match one checkpoint key to multiple keys in the model.") ``` - Install Semantic-SAM (Optional) ```shell # Semantic-SAM pip install git+https://github.com/cocodataset/panopticapi.git git clone https://github.com/UX-Decoder/Semantic-SAM cd Semantic-SAM python -m pip install -r requirements.txt ``` Find out more at [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM) ### 2. Datasets Please refer to the link [AVSBenchmark](https://github.com/OpenNLPLab/AVSBench) to download the datasets. You can put the data under `data` folder or rename your own folder. Remember to modify the path in config files. The `data` directory is as bellow: ``` |--AVS_dataset |--AVSBench_semantic/ |--AVSBench_object/Multi-sources/ |--AVSBench_object/Single-source/ ``` ### 3. Download Pre-Trained Models - The pretrained backbone is available from benchmark AVSBench pretrained backbones[TODO]. ``` |--pretrained |--detectron2/R-50.pkl |--detectron2/d2_pvt_v2_b5.pkl |--vggish-10086976.pth |--vggish_pca_params-970ea276.pth ``` ### 4. Maskiges pregeneration - Generate class-agnostic masks (Optional) ```shell sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh train # or ms3, avss sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh val sh avs_tools/pre_mask/pre_mask_semantic_sam_s4.sh test ``` - Generate Maskiges (Optional) ```shell python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split train # or ms3, avss python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split val python3 avs_tools/pre_mask2rgb/mask_precess_s4.py --split test ``` - Move Maskiges to the following folder Note: For convenience, we provide pre-generated Maskiges for S4\MS3\AVSS subset on the TODO hugging face link. ``` |--AVS_dataset |--AVSBench_semantic/pre_SAM_mask/ |--AVSBench_object/Multi-sources/ms3_data/pre_SAM_mask/ |--AVSBench_object/Single-source/s4_data/pre_SAM_mask/ ``` ### 5. Train ```shell # ResNet-50 sh scripts/res_train_avs4.sh # or ms3, avss ``` ```shell # PVTv2 sh scripts/pvt_train_avs4.sh # or ms3, avss ``` ### 6. Test ```shell # ResNet-50 sh scripts/res_test_avs4.sh # or ms3, avss ``` ```shell # PVTv2 sh scripts/pvt_test_avs4.sh # or ms3, avss ``` ## 🤝 Citing COMBO ``` @misc{yang2023cooperation, title={Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation}, author={Qi Yang and Xing Nie and Tong Li and Pengfei Gao and Ying Guo and Cheng Zhen and Pengfei Yan and Shiming Xiang}, year={2023}, eprint={2312.06462}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```