arxiv:2412.00174

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Published on Nov 29

· Submitted by

caizhongang on Dec 3

Upvote

Authors:

Zhengyu Lin ,

Tianxiang Ren ,

Zhongang Cai ,

Lei Yang ,

Ziwei Liu

Abstract

Human beings are social animals. How to equip 3D autonomous characters with similar social intelligence that can perceive, understand and interact with humans remains an open yet foundamental problem. In this paper, we introduce SOLAMI, the first end-to-end Social vision-Language-Action (VLA) Modeling framework for Immersive interaction with 3D autonomous characters. Specifically, SOLAMI builds 3D autonomous characters from three aspects: (1) Social VLA Architecture: We propose a unified social VLA framework to generate multimodal response (speech and motion) based on the user's multimodal input to drive the character for social interaction. (2) Interactive Multimodal Data: We present SynMSI, a synthetic multimodal social interaction dataset generated by an automatic pipeline using only existing motion datasets to address the issue of data scarcity. (3) Immersive VR Interface: We develop a VR interface that enables users to immersively interact with these characters driven by various architectures. Extensive quantitative experiments and user studies demonstrate that our framework leads to more precise and natural character responses (in both speech and motion) that align with user expectations with lower latency.

View arXiv page View PDF Add to collection

Community

caizhongang

Paper author Paper submitter 1 day ago

•

edited 1 day ago

SOLAMI: 3D C.AI in VR powered by social VLA model.

SOLAMI enables the user to interact with 3D autonomous characters through speech and body language in an immersive VR environment via an end-to-end social vision-language-action model. Characters can understand users' body language, act as the users' commands, and even play simple games.

Project page: https://solami-ai.github.io/
Full video demo: https://www.youtube.com/watch?v=P0juJl2Y4So
ArXiv: https://arxiv.org/abs/2412.00174