mocapact-models / README.md
akolobov's picture
Update README.md
073a9d3 verified
|
raw
history blame
7.56 kB
metadata
license: cdla-permissive-2.0
datasets:
  - microsoft/mocapact-data

MoCapAct Model Zoo

Control of simulated humanoid characters is a challenging benchmark for sequential decision-making methods, as it assesses a policy’s ability to drive an inherently unstable, discontinuous, and high-dimensional physical system. Motion capture (MoCap) data can be very helpful in learning sophisticated locomotion policies by teaching a humanoid agent low-level skills (e.g., standing, walking, and running) that can then be used to generate high-level behaviors. However, even with MoCap data, controlling simulated humanoids remains very hard, because this data offers only kinematic information. Finding physical control inputs to realize the MoCap-demonstrated motions has required methods like reinforcement learning that need large amounts of compute, which has effectively served as a barrier to entry for this exciting research direction.

In an effort to broaden participation and facilitate evaluation of ideas in humanoid locomotion research, we are releasing MoCapAct (Motion Capture with Actions), a library of high-quality pre-trained agents that can track over three hours of MoCap data for a simulated humanoid in the dm_control physics-based environment and rollouts from these experts containing proprioceptive observations and actions. MoCapAct allows researchers to sidestep the computationally intensive task of training low-level control policies from MoCap data and instead use MoCapAct's expert agents and demonstrations for learning advanced locomotion behaviors. It also allows improving on our low-level policies by using them and their demonstration data as a starting point.

In our work, we use MoCapAct to train a single hierarchical policy capable of tracking the entire MoCap dataset within dm_control. We then re-use the learned low-level component to efficiently learn other high-level tasks. Finally, we use MoCapAct to train an autoregressive GPT model and show that it can perform natural motion completion given a motion prompt. We encourage the reader to visit our project website to see videos of our results as well as get links to our paper and code.

Model Zoo Structure

The file structure of the model zoo is:

β”œβ”€β”€ all
β”‚   └── experts
β”‚       β”œβ”€β”€ experts_1.tar.gz
β”‚       β”œβ”€β”€ experts_2.tar.gz
β”‚       ...
β”‚       └── experts_8.tar.gz
β”‚
β”œβ”€β”€ sample
β”‚   └── experts.tar.gz
β”‚
β”œβ”€β”€ multiclip_policy.tar.gz
β”‚   β”œβ”€β”€ full_dataset
β”‚   └── locomotion_dataset
β”‚
β”œβ”€β”€ transfer.tar.gz
β”‚   β”œβ”€β”€ go_to_target
β”‚   β”‚   β”œβ”€β”€ general_low_level
β”‚   β”‚   β”œβ”€β”€ locomotion_low_level
β”‚   β”‚   └── no_low_level
β”‚   β”‚
β”‚   └── velocity_control
β”‚       β”œβ”€β”€ general_low_level
β”‚       β”œβ”€β”€ locomotion_low_level
β”‚       └── no_low_level
β”‚
β”œβ”€β”€ gpt.ckpt
β”‚
└── videos
    β”œβ”€β”€ full_clip_videos.tar.gz
    └── snippet_videos.tar.gz

Experts Tarball Files

The expert tarball files have the following structure:

  • all/experts/experts_*.tar.gz: Contains all of the clip snippet experts. Due to file size limitations, we split the experts among multiple tarball files.
  • sample/experts.tar.gz: Contains the clip snippet experts used to run the examples on the dataset website.

The expert structure is detailed in Appendix A.1 of the paper as well as https://github.com/microsoft/MoCapAct#description.

An expert can be loaded and rolled out in Python as in the following example:

from mocapact import observables
from mocapact.sb3 import utils
expert_path = "/path/to/experts/CMU_083_33/CMU_083_33-0-194/eval_rsi/model"
expert = utils.load_policy(expert_path, observables.TIME_INDEX_OBSERVABLES)

from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import types
dataset = types.ClipCollection(ids=['CMU_083_33'], start_steps=[0], end_steps=[194])
env = tracking.MocapTrackingGymEnv(dataset)
obs, done = env.reset(), False
while not done:
    action, _ = expert.predict(obs, deterministic=True)
    obs, rew, done, _ = env.step(action)
    print(rew)

Alternatively, an expert can be rolled out from the command line:

python -m mocapact.clip_expert.evaluate \
  --policy_root /path/to/experts/CMU_016_22/CMU_016_22-0-82/eval_rsi/model \
  --act_noise 0 \
  --ghost_offset 1 \
  --always_init_at_clip_start

GPT

The GPT policy is contained in gpt.ckpt and can be loaded using PyTorch Lightning:

from mocapact.distillation import model
policy = model.GPTPolicy.load_from_checkpoint('/path/to/gpt.ckpt', map_location='cpu')

This policy can be used with mocapact/distillation/motion_completion.py, as in the following example:

python -m mocapact.distillation.motion_completion.py \
  --policy_path /path/to/gpt.ckpt \
  --nodeterministic \
  --ghost_offset 1 \
  --expert_root /path/to/experts/CMU_016_25 \
  --max_steps 500 \
  --always_init_at_clip_start \
  --prompt_length 32 \
  --min_steps 32 \
  --device cuda \
  --clip_snippet CMU_016_25

Multi-Clip Policy

The multiclip_policy.tar.gz file contains two policies:

  • full_dataset: Trained on the entire MoCapAct dataset
  • locomotion_dataset: Trained on the locomotion_small portion of the MoCapAct dataset

Taking full_dataset as an example, a multi-clip policy can be loaded using PyTorch Lightning:

from mocapact.distillation import model
policy = model.NpmpPolicy.load_from_checkpoint('/path/to/multiclip_policy/full_dataset/model/model.ckpt', map_location='cpu')

The policy can be used with mocapact/distillation/evaluate.py, as in the following example:

python -m mocapact.distillation.evaluate \
  --policy_path /path/to/multiclip_policy/full_dataset/model/model.ckpt \
  --act_noise 0 \
  --ghost_offset 1 \
  --always_init_at_clip_start \
  --termination_error_threshold 10 \
  --clip_snippets CMU_016_22

Transfer

The transfer.tar.gz file contains policies for downstream tasks. The main difference between the contained folders is what low-level policy is used:

  • general_low_level: Low-level policy comes from multiclip_policy/full_dataset
  • locomotion_low_level: Low-level policy comes from multiclip_policy/locomotion_dataset
  • no_low_level: No low-level policy used

The policy structure is as follows:

β”œβ”€β”€ best_model.zip
β”œβ”€β”€ low_level_policy.ckpt
└── vecnormalize.pkl

The low_level_policy.ckpt (only present in general_low_level and locomotion_low_level) contains the low-level policy and is loaded with PyTorch Lightning. The best_model.zip file contains the task policy parameters. The vecnormalize.pkl file contains the observation normalizer. The latter two files are loaded with Stable-Baselines3.

The policy can be used with mocapact/transfer/evaluate.py, as in the following example:

python -m mocapact.transfer.evaluate \
  --model_root /path/to/transfer/go_to_target/general_low_level \
  --task /path/to/mocapact/transfer/config.py:go_to_target

MoCap Videos

There are two tarball files containing videos of the MoCap clips in the dataset:

  • full_clip_videos.tar.gz contains videos of the full MoCap clips.
  • snippet_videos.tar.gz contains videos of the snippets that were used to train the experts. Note that they are playbacks of the clips themselves, not rollouts of the corresponding experts.