ChihHsuan-Yang commited on
Commit
824c735
1 Parent(s): 74d0af3

updated readme

Browse files
Files changed (1) hide show
  1. README.md +162 -140
README.md CHANGED
@@ -1,140 +1,162 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- tags:
6
- - zero-shot-image-classification
7
- - clip
8
- - biodiversity
9
- - vision-language
10
- - animals
11
- - species
12
- - insects
13
- - taxonomy
14
- - confounding species
15
- - multimodal
16
- datasets:
17
- - ARBORETUM-40M
18
- ---
19
-
20
-
21
- # Model Card for ArborCLIP
22
-
23
- <!-- Banner links -->
24
- <div style="text-align:center;">
25
- <a href="https://baskargroup.github.io/Arboretum/" target="_blank">
26
- <img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;">
27
- </a>
28
- <a href="https://github.com/baskargroup/Arboretum" target="_blank">
29
- <img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;">
30
- </a>
31
- <a href="https://pypi.org/project/arbor-process/" target="_blank">
32
- <img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0">
33
- </a>
34
- </div>
35
-
36
-
37
- ARBORCLIP is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
38
-
39
- - **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
40
- - **License:** MIT
41
- - **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP)
42
-
43
- These models were developed for the benefit of the AI community as an open-source product, thus we request that any derivative products are also open-source.
44
-
45
- **See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section for examples of how to use ArborCLIP models in zero-shot settings.**
46
-
47
-
48
- ### Model Description
49
-
50
- ArborCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
51
- The models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
52
-
53
- - **ARBORCLIP-O:** OpenAI's ViT-B/16 checkpoint, using [OpenCLIP's](https://github.com/mlfoundations/open_clip) weights and trained for 40 epochs.
54
- - **ARBORCLIP-B:** BioCLIP's ViT-B/16 checkpoint, using [BioCLIP](https://github.com/Imageomics/BioCLIP) weights and trained for 8 epochs
55
- - **ARBORCLIP-M:** MetaCLIP's ViT-L/14 checkpoint, using [MetaCLIP](https://github.com/facebookresearch/MetaCLIP) weights and trained for 12 epochs
56
-
57
-
58
- To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights - `arborclip-vit-b-16-from-openai-epoch-40.pt` (**ARBORCLIP-O**), `arborclip-vit-b-16-from-bioclip-epoch-8.pt` (**ARBORCLIP-B**) and`arborclip-vit-l-14-from-metaclip-epoch-12.pt` (**ARBORCLIP-M**).
59
-
60
- ### Model Training
61
-
62
- We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained for 40 epochs on Arboretum-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster.
63
-
64
- We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
65
-
66
- ```
67
- --dataset-type webdataset
68
- --pretrained openai
69
- --text_type random
70
- --dataset-resampled
71
- --warmup 5000
72
- --batch-size 4096
73
- --accum-freq 1
74
- --epochs 40
75
- --workers 8
76
- --model ViT-B-16
77
- --lr 0.0005
78
- --wd 0.0004
79
- --precision bf16
80
- --beta1 0.98
81
- --beta2 0.99
82
- --eps 1.0e-6
83
- --local-loss
84
- --gather-with-grad
85
- --ddp-static-graph
86
- --grad-checkpointing
87
- ```
88
-
89
- For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the OpenCLIP and BioCLIP documentation, respectively.
90
-
91
- ### Model Validation
92
-
93
- For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications.
94
-
95
- #### Pre-Run
96
-
97
- After cloning this repository and navigating to the `Arboretum/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `Arboretum/model_validation/src` to your PYTHONPATH.
98
-
99
- ```bash
100
- export PYTHONPATH="$PYTHONPATH:$PWD/src";
101
- ```
102
-
103
- #### Base Command
104
-
105
- A basic Arboretum model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
106
-
107
- ```bash
108
- python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
109
- ```
110
-
111
- ### Training Dataset
112
- - **Dataset Repository:** [Arboretum](https://github.com/baskargroup/Arboretum)
113
- - **Dataset Paper:** Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
114
- - **HF Dataset card:** [Arboretum](https://huggingface.co/datasets/ChihHsuan-Yang/Arboretum)
115
-
116
-
117
-
118
- <!--BibTex citation -->
119
- <section class="section" id="BibTeX">
120
- <div class="container is-max-widescreen content">
121
- <h2 class="title">Citation</h2>
122
- If you find this dataset useful in your research, please consider citing our paper:
123
- <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
124
- title={Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity},
125
- author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
126
- Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
127
- Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
128
- year={2024},
129
- eprint={2406.17720},
130
- archivePrefix={arXiv},
131
- primaryClass={cs.CV},
132
- url={https://arxiv.org/abs/2406.17720},
133
- }</code></pre>
134
- </div>
135
- </section>
136
- <!--End BibTex citation -->
137
-
138
- ---
139
-
140
- For more details and access to the Arboretum dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - zero-shot-image-classification
7
+ - clip
8
+ - biology
9
+ - CV
10
+ - images
11
+ - animals
12
+ - species
13
+ - taxonomy
14
+ - rare species
15
+ - endangered species
16
+ - evolutionary biology
17
+ - multimodal
18
+ - knowledge-guided
19
+ datasets:
20
+ - imageomics/TreeOfLife-10M
21
+ - iNat21
22
+ - BIOSCAN-1M
23
+ - EOL
24
+ ---
25
+
26
+
27
+ # Model Card for ArborCLIP
28
+
29
+ <!-- Banner links -->
30
+ <div style="text-align:center;">
31
+ <a href="https://baskargroup.github.io/Arboretum/" target="_blank">
32
+ <img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;">
33
+ </a>
34
+ <a href="https://github.com/baskargroup/Arboretum" target="_blank">
35
+ <img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;">
36
+ </a>
37
+ <a href="https://pypi.org/project/arbor-process/" target="_blank">
38
+ <img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0">
39
+ </a>
40
+ </div>
41
+
42
+
43
+ ARBORCLIP is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
44
+
45
+ - **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
46
+ - **License:** MIT
47
+ - **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP)
48
+
49
+ These models were developed for the benefit of the AI community as an open-source product, thus we request that any derivative products are also open-source.
50
+
51
+
52
+ ### Model Description
53
+
54
+ ArborCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
55
+ The models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
56
+
57
+ - **ARBORCLIP-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. Training was conducted for 40 epochs.
58
+ - **ARBORCLIP-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. Training was conducted for 8 epochs.
59
+ - **ARBORCLIP-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. Training was conducted for 12 epochs.
60
+
61
+
62
+ To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights - `arborclip-vit-b-16-from-openai-epoch-40.pt` (**ARBORCLIP-O**), `arborclip-vit-b-16-from-bioclip-epoch-8.pt` (**ARBORCLIP-B**) and`arborclip-vit-l-14-from-metaclip-epoch-12.pt` (**ARBORCLIP-M**).
63
+
64
+ ### Model Training
65
+ **See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use ArborCLIP models in zero-shot image classification tasks.**
66
+
67
+ We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on Arboretum-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page.
68
+
69
+ We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
70
+
71
+ ```
72
+ --dataset-type webdataset
73
+ --pretrained openai
74
+ --text_type random
75
+ --dataset-resampled
76
+ --warmup 5000
77
+ --batch-size 4096
78
+ --accum-freq 1
79
+ --epochs 40
80
+ --workers 8
81
+ --model ViT-B-16
82
+ --lr 0.0005
83
+ --wd 0.0004
84
+ --precision bf16
85
+ --beta1 0.98
86
+ --beta2 0.99
87
+ --eps 1.0e-6
88
+ --local-loss
89
+ --gather-with-grad
90
+ --ddp-static-graph
91
+ --grad-checkpointing
92
+ ```
93
+
94
+ For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the [OpenCLIP](https://github.com/mlfoundations/open_clip) and [BioCLIP](https://github.com/Imageomics/BioCLIP) documentation, respectively.
95
+
96
+ ### Model Validation
97
+
98
+ For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications.
99
+
100
+ #### Pre-Run
101
+
102
+ After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `Arboretum/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `Arboretum/model_validation/src` to your PYTHONPATH.
103
+
104
+ ```bash
105
+ export PYTHONPATH="$PYTHONPATH:$PWD/src";
106
+ ```
107
+
108
+ #### Base Command
109
+
110
+ A basic Arboretum model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
111
+
112
+ ```bash
113
+ python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
114
+ ```
115
+
116
+ ### Training Dataset
117
+ - **Dataset Repository:** [Arboretum](https://github.com/baskargroup/Arboretum)
118
+ - **Dataset Paper:** Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
119
+ - **HF Dataset card:** [Arboretum](https://huggingface.co/datasets/ChihHsuan-Yang/Arboretum)
120
+
121
+
122
+ ### Model's Limitation
123
+ All the `ArborCLIP` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities.
124
+
125
+ In general, we found that models trained on web-scraped data performed better with common
126
+ names, whereas models trained on specialist datasets performed better when using scientific names.
127
+ Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
128
+ level (kingdom), while models begin to benefit from specialist datasets like [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) and
129
+ [Tree-of-Life-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `ArborCLIP` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
130
+
131
+ Addressing these limitations will further enhance the applicability of models like `ArborCLIP` in
132
+ real-world biodiversity monitoring tasks.
133
+
134
+ ### Acknowledgements
135
+ This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also
136
+ partly supported by the NSF under CPS Frontier grant CNS-1954556. Also, we gratefully
137
+ acknowledge the support of NYU IT [High Performance Computing](https://www.nyu.edu/life/information-technology/research-computing-services/high-performance-computing.html) resources, services, and staff
138
+ expertise.
139
+
140
+ <!--BibTex citation -->
141
+ <section class="section" id="BibTeX">
142
+ <div class="container is-max-widescreen content">
143
+ <h2 class="title">Citation</h2>
144
+ If you find the models and datasets useful in your research, please consider citing our paper:
145
+ <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
146
+ title={Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity},
147
+ author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
148
+ Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
149
+ Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
150
+ year={2024},
151
+ eprint={2406.17720},
152
+ archivePrefix={arXiv},
153
+ primaryClass={cs.CV},
154
+ url={https://arxiv.org/abs/2406.17720},
155
+ }</code></pre>
156
+ </div>
157
+ </section>
158
+ <!--End BibTex citation -->
159
+
160
+ ---
161
+
162
+ For more details and access to the Arboretum dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).