qizekun's picture
Create README.md
c7d7f3f verified
metadata
license: apache-2.0
datasets:
  - qizekun/ShapeLLM
language:
  - en

Install

  1. Clone this repository and navigate to ShapeLLM folder
git clone https://github.com/qizekun/ShapeLLM.git
cd ShapeLLM
  1. Install Package
conda create -n shapellm python=3.10 -y
conda activate shapellm
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
  1. Install PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

ShapeLLM

model weights

Please check out our Model Zoo for all public ShapeLLM checkpoints.

Demo

CLI Inference

Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.

python -m llava.serve.cli \
    --model-path qizekun/ShapeLLM_7B_gapartnet_v1.0 \
    --pts-file playground/data/shapellm/gapartnet_pcs/StorageFurniture_45189_0_1.npy

Training

Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data. Download data following DATA, organize the data as follows in ./playground/data/shapellm/,

β”‚playground/data/shapellm/
β”œβ”€β”€ cap3d_objaverse_785k.json
β”œβ”€β”€ cap3d_objaverse_sft_45k.json
β”œβ”€β”€ gapartnet_sft_27k_openai.json
β”œβ”€β”€ gapartnet_pcs
β”‚   β”œβ”€β”€ Box_100129_0_0.npy
β”‚   └── ...
└── cap3d_pcs
    β”œβ”€β”€ 00000054c36d44a2a483bdbff31d8edf.pt
    └── ...

Furthermore, ShapeLLM utilizes the Large version of ReCon++ as the point encoder. You need to download the ReCon++ weight and save it to ./checkpoints/recon/large.pth.

β”‚checkpoints/recon/
└── large.pth

1. Feature Alignment Stage

sh scripts/pretrain.sh

2. Visual Instruction Tuning Stage

sh scripts/finetune.sh

The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.

Zero-shot Understanding on 3D MM-Vet

Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:

sh scripts/eval/mmvet.sh

Using GPT4 to calulate the 3D MM-Vet score:

sh scripts/eval/eval_mmvet.sh

Visual Grounding on GApartNet

Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:

sh scripts/eval/gapartnet_ref.sh

Calucate the generative 3D visual grounding accuracy:

sh scripts/eval/eval_gapartnet.sh