updated readme

Browse files

Files changed (1) hide show

README.md +162 -140

README.md CHANGED Viewed

@@ -1,140 +1,162 @@
----
-license: mit
-language:
-- en
-tags:
-- zero-shot-image-classification
-- clip
-- biodiversity
-- vision-language
-- animals
-- species
-- insects
-- taxonomy
-- confounding species
-- multimodal
-datasets:
-- ARBORETUM-40M
----
-# Model Card for ArborCLIP
-<!-- Banner links -->
-<div style="text-align:center;">
-  <a href="https://baskargroup.github.io/Arboretum/" target="_blank">
-    <img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;">
-  </a>
-  <a href="https://github.com/baskargroup/Arboretum" target="_blank">
-    <img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;">
-  </a>
-  <a href="https://pypi.org/project/arbor-process/" target="_blank">
-    <img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0">
-  </a>
-</div>
-ARBORCLIP is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
-- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
-- **License:** MIT
-- **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP)
-These models were developed for the benefit of the AI community as an open-source product, thus we request that any derivative products are also open-source.
-**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section for examples of how to use ArborCLIP models in zero-shot settings.**
-### Model Description
-ArborCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
-The models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
-- **ARBORCLIP-O:** OpenAI's ViT-B/16 checkpoint, using [OpenCLIP's](https://github.com/mlfoundations/open_clip) weights and trained for 40 epochs.
-- **ARBORCLIP-B:** BioCLIP's ViT-B/16 checkpoint, using [BioCLIP](https://github.com/Imageomics/BioCLIP) weights and trained for 8 epochs
-- **ARBORCLIP-M:** MetaCLIP's ViT-L/14 checkpoint, using [MetaCLIP](https://github.com/facebookresearch/MetaCLIP) weights and trained for 12 epochs
-To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights - `arborclip-vit-b-16-from-openai-epoch-40.pt` (**ARBORCLIP-O**), `arborclip-vit-b-16-from-bioclip-epoch-8.pt` (**ARBORCLIP-B**) and`arborclip-vit-l-14-from-metaclip-epoch-12.pt` (**ARBORCLIP-M**).
-### Model Training
-We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained for 40 epochs on Arboretum-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster.
-We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
-```
---dataset-type webdataset
---pretrained openai
---text_type random
---dataset-resampled
---warmup 5000
---batch-size 4096
---accum-freq 1
---epochs 40
---workers 8
---model ViT-B-16
---lr 0.0005
---wd 0.0004
---precision bf16
---beta1 0.98
---beta2 0.99
---eps 1.0e-6
---local-loss
---gather-with-grad
---ddp-static-graph
---grad-checkpointing
-```
-For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the OpenCLIP and BioCLIP documentation, respectively.
-### Model Validation
-For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications.
-#### Pre-Run
-After cloning this repository and navigating to the `Arboretum/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `Arboretum/model_validation/src` to your PYTHONPATH.
-```bash
-export PYTHONPATH="$PYTHONPATH:$PWD/src";
-```
-#### Base Command
-A basic Arboretum model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
-```bash
-python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
-```
-### Training Dataset
-- **Dataset Repository:** [Arboretum](https://github.com/baskargroup/Arboretum)
-- **Dataset Paper:** Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
-- **HF Dataset card:** [Arboretum](https://huggingface.co/datasets/ChihHsuan-Yang/Arboretum)
-<!--BibTex citation -->
-<section class="section" id="BibTeX">
-  <div class="container is-max-widescreen content">
-      <h2 class="title">Citation</h2>
-      If you find this dataset useful in your research, please consider citing our paper:
-      <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
-        title={Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity},
-        author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
-           Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
-            Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
-        year={2024},
-        eprint={2406.17720},
-        archivePrefix={arXiv},
-        primaryClass={cs.CV},
-        url={https://arxiv.org/abs/2406.17720},
-  }</code></pre>
-  </div>
-</section>
-<!--End BibTex citation -->
----
-For more details and access to the Arboretum dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).

+---
+license: mit
+language:
+- en
+tags:
+  - zero-shot-image-classification
+  - clip
+  - biology
+  - CV
+  - images
+  - animals
+  - species
+  - taxonomy
+  - rare species
+  - endangered species
+  - evolutionary biology
+  - multimodal
+  - knowledge-guided
+datasets:
+  - imageomics/TreeOfLife-10M
+  - iNat21
+  - BIOSCAN-1M
+  - EOL
+---
+# Model Card for ArborCLIP
+<!-- Banner links -->
+<div style="text-align:center;">
+  <a href="https://baskargroup.github.io/Arboretum/" target="_blank">
+    <img src="https://img.shields.io/badge/Project%20Page-Visit-blue" alt="Project Page" style="margin-right:10px;">
+  </a>
+  <a href="https://github.com/baskargroup/Arboretum" target="_blank">
+    <img src="https://img.shields.io/badge/GitHub-Visit-lightgrey" alt="GitHub" style="margin-right:10px;">
+  </a>
+  <a href="https://pypi.org/project/arbor-process/" target="_blank">
+    <img src="https://img.shields.io/badge/PyPI-arbor--process%200.1.0-orange" alt="PyPI arbor-process 0.1.0">
+  </a>
+</div>
+ARBORCLIP is a new suite of vision-language foundation models for biodiversity. These CLIP-style foundation models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/), which is a large-scale dataset of 40 million images of 33K species of plants and animals. The models are evaluated on zero-shot image classification tasks.
+- **Model type:** Vision Transformer (ViT-B/16, ViT-L/14)
+- **License:** MIT
+- **Fine-tuned from model:** [OpenAI CLIP](https://github.com/mlfoundations/open_clip), [MetaCLIP](https://github.com/facebookresearch/MetaCLIP), [BioCLIP](https://github.com/Imageomics/BioCLIP)
+These models were developed for the benefit of the AI community as an open-source product, thus we request that any derivative products are also open-source.
+### Model Description
+ArborCLIP is based on OpenAI's [CLIP](https://openai.com/research/clip) model.
+The models were trained on [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) for the following configurations:
+- **ARBORCLIP-O:** Trained a ViT-B/16 backbone initialized from the [OpenCLIP's](https://github.com/mlfoundations/open_clip) checkpoint. Training was conducted for 40 epochs.
+- **ARBORCLIP-B:** Trained a ViT-B/16 backbone initialized from the [BioCLIP's](https://github.com/Imageomics/BioCLIP) checkpoint. Training was conducted for 8 epochs.
+- **ARBORCLIP-M:** Trained a ViT-L/14 backbone initialized from the [MetaCLIP's](https://github.com/facebookresearch/MetaCLIP) checkpoint. Training was conducted for 12 epochs.
+To access the checkpoints of the above models, go to the `Files and versions` tab and download the weights. These weights can be directly used for zero-shot classification and finetuning. The filenames correspond to the specific model weights - `arborclip-vit-b-16-from-openai-epoch-40.pt` (**ARBORCLIP-O**), `arborclip-vit-b-16-from-bioclip-epoch-8.pt` (**ARBORCLIP-B**) and`arborclip-vit-l-14-from-metaclip-epoch-12.pt` (**ARBORCLIP-M**).
+### Model Training
+**See the [Model Training](https://github.com/baskargroup/Arboretum?tab=readme-ov-file#model-training) section on the [Github](https://github.com/baskargroup/Arboretum) for examples of how to use ArborCLIP models in zero-shot  image classification tasks.**
+We train three models using a modified version of the [BioCLIP / OpenCLIP](https://github.com/Imageomics/bioclip/tree/main/src/training) codebase. Each model is trained on Arboretum-40M, on 2 nodes, 8xH100 GPUs, on NYU's [Greene](https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene) high-performance compute cluster. We publicly release all code needed to reproduce our results on the [Github](https://github.com/baskargroup/Arboretum) page.
+We optimize our hyperparameters prior to training with [Ray](https://docs.ray.io/en/latest/index.html). Our standard training parameters are as follows:
+```
+--dataset-type webdataset
+--pretrained openai
+--text_type random
+--dataset-resampled
+--warmup 5000
+--batch-size 4096
+--accum-freq 1
+--epochs 40
+--workers 8
+--model ViT-B-16
+--lr 0.0005
+--wd 0.0004
+--precision bf16
+--beta1 0.98
+--beta2 0.99
+--eps 1.0e-6
+--local-loss
+--gather-with-grad
+--ddp-static-graph
+--grad-checkpointing
+```
+For more extensive documentation of the training process and the significance of each hyperparameter, we recommend referencing the [OpenCLIP](https://github.com/mlfoundations/open_clip) and [BioCLIP](https://github.com/Imageomics/BioCLIP) documentation, respectively.
+### Model Validation
+For validating the zero-shot accuracy of our trained models and comparing to other benchmarks, we use the [VLHub](https://github.com/penfever/vlhub) repository with some slight modifications.
+#### Pre-Run
+After cloning the [Github](https://github.com/baskargroup/Arboretum) repository and navigating to the `Arboretum/model_validation` directory, we recommend installing all the project requirements into a conda container; `pip install -r requirements.txt`. Also, before executing a command in VLHub, please add `Arboretum/model_validation/src` to your PYTHONPATH.
+```bash
+export PYTHONPATH="$PYTHONPATH:$PWD/src";
+```
+#### Base Command
+A basic Arboretum model evaluation command can be launched as follows. This example would evaluate a CLIP-ResNet50 checkpoint whose weights resided at the path designated via the `--resume` flag on the ImageNet validation set, and would report the results to Weights and Biases.
+```bash
+python src/training/main.py --batch-size=32 --workers=8 --imagenet-val "/imagenet/val/" --model="resnet50" --zeroshot-frequency=1 --image-size=224 --resume "/PATH/TO/WEIGHTS.pth" --report-to wandb
+```
+### Training Dataset
+- **Dataset Repository:** [Arboretum](https://github.com/baskargroup/Arboretum)
+- **Dataset Paper:** Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity ([arXiv](https://arxiv.org/abs/2406.17720))
+- **HF Dataset card:** [Arboretum](https://huggingface.co/datasets/ChihHsuan-Yang/Arboretum)
+### Model's Limitation
+All the `ArborCLIP` models were evaluated on the challenging [CONFOUNDING-SPECIES](https://arxiv.org/abs/2306.02507) benchmark. However, all the models performed at or below random chance. This could be an interesting avenue for follow-up work and further expand the models capabilities.
+In general, we found that models trained on web-scraped data performed better with common
+names, whereas models trained on specialist datasets performed better when using scientific names.
+Additionally, models trained on web-scraped data excel at classifying at the highest taxonomic
+level (kingdom), while models begin to benefit from specialist datasets like [ARBORETUM-40M](https://baskargroup.github.io/Arboretum/) and
+[Tree-of-Life-10M](https://huggingface.co/datasets/imageomics/TreeOfLife-10M) at the lower taxonomic levels (order and species). From a practical standpoint, `ArborCLIP` is highly accurate at the species level, and higher-level taxa can be deterministically derived from lower ones.
+Addressing these limitations will further enhance the applicability of models like `ArborCLIP` in
+real-world biodiversity monitoring tasks.
+### Acknowledgements
+This work was supported by the AI Research Institutes program supported by the NSF and USDA-NIFA under [AI Institute: for Resilient Agriculture](https://aiira.iastate.edu/), Award No. 2021-67021-35329. This was also
+partly supported by the NSF under CPS Frontier grant CNS-1954556. Also, we gratefully
+acknowledge the support of NYU IT [High Performance Computing](https://www.nyu.edu/life/information-technology/research-computing-services/high-performance-computing.html) resources, services, and staff
+expertise.
+<!--BibTex citation -->
+<section class="section" id="BibTeX">
+  <div class="container is-max-widescreen content">
+      <h2 class="title">Citation</h2>
+      If you find the models and datasets useful in your research, please consider citing our paper:
+      <pre><code>@misc{yang2024arboretumlargemultimodaldataset,
+        title={Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity},
+        author={Chih-Hsuan Yang, Benjamin Feuer, Zaki Jubery, Zi K. Deng, Andre Nakkab,
+           Md Zahid Hasan, Shivani Chiranjeevi, Kelly Marshall, Nirmal Baishnab, Asheesh K Singh,
+            Arti Singh, Soumik Sarkar, Nirav Merchant, Chinmay Hegde, Baskar Ganapathysubramanian},
+        year={2024},
+        eprint={2406.17720},
+        archivePrefix={arXiv},
+        primaryClass={cs.CV},
+        url={https://arxiv.org/abs/2406.17720},
+  }</code></pre>
+  </div>
+</section>
+<!--End BibTex citation -->
+---
+For more details and access to the Arboretum dataset, please visit the [Project Page](https://baskargroup.github.io/Arboretum/).