metadata

license: mit
tags:
  - reinforcement learning
  - world model
  - continuous control
  - robotics
pipeline_tag: reinforcement-learning

Model Card for TD-MPC2

Official release of TD-MPC2 model checkpoints for the paper

Scalable, Robust World Models for Continuous Control by

Nicklas Hansen, Hao Su*, Xiaolong Wang* (UC San Diego)

Quick links: [Website] [Paper] [Dataset]

Model Details

We open-source a total of 324 TD-MPC2 model checkpoints, including 12 multi-task models (ranging from 1M to 317M parameters) trained on 80, 70, and 30 tasks, respectively. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models.

Model Description

Developed by: Nicklas Hansen (UC San Diego)
Model type: TD-MPC2 models trained on tasks from DMControl, Meta-World, Maniskill2, and MyoSuite.
License: MIT

Model Sources

Repository: https://github.com/nicklashansen/tdmpc2
Paper: https://arxiv.org/abs/2310.16828

Uses

As one of the first major releases of model checkpoints for reinforcement learning, use of our TD-MPC2 checkpoints is fairly open-ended. We envision that our checkpoints will be useful for researchers interested in training, finetuning, evaluating, and analyzing models on any of the 104 continuous control tasks that we release models for. However, we also expect the community to discover new use cases for these checkpoints.

Direct Use

Model checkpoints can be loaded using the official implementation, and then be used to reproduce our results and/or generate trajectories for any of the supported tasks.

Out-of-Scope Use

We do not expect our model checkpoints to generalize to new (unseen) tasks as is. Such model usage will most likely require some amount of fine-tuning on target task data.

How to Get Started with the Models

Refer to the official implementation for installation instructions and example usage.

Training Details

We describe the training procedure for single-task and multi-task model checkpoints in the following.

Training Procedure (Single-task)

Single-task checkpoints are trained using the official implementation with default hyperparameters. All models have 5M parameters. Most, but not all, models are trained until convergence. Refer to the individual task curves in our paper for a detailed breakdown of model performance on each task.

Training Procedure (Multi-task)

Multi-task checkpoints are trained using the official implementation with batch_size=1024 and otherwise default hyperparameters. We release checkpoints trained on the 80-task and 30-task datasets provided here, as well as a 70-task dataset that is obtained by filtering the 80-task dataset based on task IDs. We release model checkpoints ranging from 1M to 317M parameters.

Environmental Impact

Carbon emissions are estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: NVIDIA GeForce RTX 3090
Hours used: Approx. 50,000
Provider: Private infrastructure
Carbon Emitted: Approx. 7560 kg CO2eq

Citation

If you find our work useful, please consider citing the paper as follows:

BibTeX:

@misc{hansen2023tdmpc2,
      title={TD-MPC2: Scalable, Robust World Models for Continuous Control}, 
      author={Nicklas Hansen and Hao Su and Xiaolong Wang},
      year={2023},
      eprint={2310.16828},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contact

Correspondence to: Nicklas Hansen