Edit model card

LLaVA-3D

Table of Contents

  1. Model Summary
  2. Use
  3. Limitations
  4. Training
  5. License
  6. Citation

Model Summary

The LLaVA-3D model is a 7B parameter models trained on LLaVA-3D-Instruct-1M, based on LLaVA-v1.5-7B.

Use

Intended use

The model was trained on LLaVA-3D-Instruct-1M and has the ability to interact with the single image for 2D tasks and posed RBG-D images for 3D tasks.

Feel free to share your generations in the Community tab!

Training

Model

  • Pretraining Stage: scene-level and region-level caption data, 1 epoch, projector
  • Instructing Tuning Stage: A mixture of 1M high-quality 2D and 3D data, 1 epoch, full model
  • Precision: bfloat16

Hardware & Software

Citation

@article{zhu2024llava,
  title={LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness},
  author={Zhu, Chenming and Wang, Tai and Zhang, Wenwei and Pang, Jiangmiao and Liu, Xihui},
  journal={arXiv preprint arXiv:2409.18125},
  year={2024}
}
Downloads last month
1,704
Safetensors
Model size
7.08B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for ChaimZhu/LLaVA-3D-7B

Finetuned
(4)
this model