Edit model card

ShapeLLM model

This repository contains the ShapeLLM-13B model presented in ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.

Install

  1. Clone this repository and navigate to ShapeLLM folder
git clone https://github.com/qizekun/ShapeLLM.git
cd ShapeLLM
  1. Install Package
conda create -n shapellm python=3.10 -y
conda activate shapellm
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
  1. Install PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

ShapeLLM

model weights

Please check out our Model Zoo for all public ShapeLLM checkpoints.

Demo

CLI Inference

Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.

python -m llava.serve.cli \
    --model-path qizekun/ShapeLLM_13B_general_v1.0 \
    --pts-file assets/instrument.npy

Training

Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data. Download data following DATA, organize the data as follows in ./playground/data/shapellm/,

β”‚playground/data/shapellm/
β”œβ”€β”€ cap3d_objaverse_785k.json
β”œβ”€β”€ cap3d_objaverse_sft_45k.json
β”œβ”€β”€ gapartnet_sft_27k_openai.json
β”œβ”€β”€ gapartnet_pcs
β”‚   β”œβ”€β”€ Box_100129_0_0.npy
β”‚   └── ...
└── cap3d_pcs
    β”œβ”€β”€ 00000054c36d44a2a483bdbff31d8edf.pt
    └── ...

Furthermore, ShapeLLM utilizes the Large version of ReCon++ as the point encoder. You need to download the ReCon++ weight and save it to ./checkpoints/recon/large.pth.

β”‚checkpoints/recon/
└── large.pth

1. Feature Alignment Stage

sh scripts/pretrain.sh

2. Visual Instruction Tuning Stage

sh scripts/finetune.sh

The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.

Zero-shot Understanding on 3D MM-Vet

Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:

sh scripts/eval/mmvet.sh

Using GPT4 to calulate the 3D MM-Vet score:

sh scripts/eval/eval_mmvet.sh

Visual Grounding on GApartNet

Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:

sh scripts/eval/gapartnet_ref.sh

Calucate the generative 3D visual grounding accuracy:

sh scripts/eval/eval_gapartnet.sh
Downloads last month
77
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train qizekun/ShapeLLM_13B_general_v1.0

Collection including qizekun/ShapeLLM_13B_general_v1.0