|
--- |
|
inference: false |
|
pipeline_tag: image-text-to-text |
|
datasets: |
|
- teamcraft/TeamCraft-Data-Dec |
|
--- |
|
<br> |
|
<br> |
|
|
|
# TeamCraft-VLA-7B-Dec Model Card |
|
|
|
TeamCraft-VLA-7B-Dec is a multi-modal vision-language action model designed for decentralized multi-agent collaborations. The model encodes multi-modal prompts specifying the task, one agent's visual observation and inventory at each timestep to generate actionable output for single agents under multi-agent settings. |
|
|
|
## Usage |
|
|
|
We provide a full environment with detailed running instruction on [GitHub](https://github.com/teamcraft-bench/teamcraft). |
|
|
|
## Model details |
|
|
|
The TeamCraft-VLA (Vision-Language-Action) architecture integrates a CLIP ViT-L/14 visual encoder with a linear projector for modality alignment and Vicuna-v1.5-7B (Llama 2.0) as the LLM backbone, combining visual and text embeddings to generate actions for multi-agent tasks. |
|
|
|
**Model Type:** |
|
|
|
- Vision-Language Action Model |
|
|
|
**Model version:** |
|
|
|
- v1.0 |
|
|
|
**Model date:** |
|
|
|
- TeamCraft-VLA-7B-Dec is trained on September 2024 |
|
|
|
**Training dataset:** |
|
|
|
- [Teamcraft decentralized full dataset](https://huggingface.co/datasets/teamcraft/TeamCraft-Data-Dec) |
|
|
|
|
|
## Uses |
|
|
|
### Direct use |
|
- **Primary intended uses:** |
|
The primary use of the TeamCraft-VLA-7B-Dec is research on multi-agents under multi-modal settings. |
|
|
|
- **Primary intended users:** |
|
The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, multi-agent system, and artificial intelligence. |
|
|
|
### Out-of-Scope Use: |
|
|
|
- The model is not designed for real-world decision-making or deployment in safety-critical systems. |
|
|
|
- The model not be used for tasks requiring ethical reasoning, moral judgments, or any applications where improper actions could lead to harm or violation of regulations. |
|
|
|
## License |
|
Llama 2 is licensed under the LLAMA 2 Community License, |
|
Copyright (c) Meta Platforms, Inc. All Rights Reserved. |