license: mit
tags:
- reinforcement learning
- world model
- continuous control
- robotics
pipeline_tag: reinforcement-learning
Model Card for TD-MPC2
Official release of TD-MPC2 model checkpoints for the paper
Scalable, Robust World Models for Continuous Control by
Nicklas Hansen, Hao Su*, Xiaolong Wang* (UC San Diego)
Quick links: [Website] [Paper] [Dataset]
Model Details
We open-source a total of 324 TD-MPC2 model checkpoints, including 12 multi-task models (ranging from 1M to 317M parameters) trained on 80, 70, and 30 tasks, respectively. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models.
Model Description
- Developed by: Nicklas Hansen (UC San Diego)
- Model type: TD-MPC2 models trained on tasks from DMControl, Meta-World, Maniskill2, and MyoSuite.
- License: MIT
Model Sources
- Repository: https://github.com/nicklashansen/tdmpc2
- Paper: https://arxiv.org/abs/2310.16828
Uses
As one of the first major releases of model checkpoints for reinforcement learning, use of our TD-MPC2 checkpoints is fairly open-ended. We envision that our checkpoints will be useful for researchers interested in training, finetuning, evaluating, and analyzing models on any of the 104 continuous control tasks that we release models for. However, we also expect the community to discover new use cases for these checkpoints.
Direct Use
Model checkpoints can be loaded using the official implementation, and then be used to reproduce our results and/or generate trajectories for any of the supported tasks.
Out-of-Scope Use
We do not expect our model checkpoints to generalize to new (unseen) tasks as is. Such model usage will most likely require some amount of fine-tuning on target task data.
How to Get Started with the Models
Refer to the official implementation for installation instructions and example usage.
Training Details
We describe the training procedure for single-task and multi-task model checkpoints in the following.
Training Procedure (Single-task)
Single-task checkpoints are trained using the official implementation with default hyperparameters. All models have 5M parameters. Most, but not all, models are trained until convergence. Refer to the individual task curves in our paper for a detailed breakdown of model performance on each task.
Training Procedure (Multi-task)
Multi-task checkpoints are trained using the official implementation with batch_size=1024
and otherwise default hyperparameters. We release checkpoints trained on the 80-task and 30-task datasets provided here, as well as a 70-task dataset that is obtained by filtering the 80-task dataset based on task IDs. We release model checkpoints ranging from 1M to 317M parameters.
Environmental Impact
Carbon emissions are estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA GeForce RTX 3090
- Hours used: Approx. 50,000
- Provider: Private infrastructure
- Carbon Emitted: Approx. 7560 kg CO2eq
Citation
If you find our work useful, please consider citing the paper as follows:
BibTeX:
@misc{hansen2023tdmpc2,
title={TD-MPC2: Scalable, Robust World Models for Continuous Control},
author={Nicklas Hansen and Hao Su and Xiaolong Wang},
year={2023},
eprint={2310.16828},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Contact
Correspondence to: Nicklas Hansen