nicklashansen
commited on
Commit
•
38f2a4d
1
Parent(s):
21042a6
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,91 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
tags:
|
4 |
+
- reinforcement learning
|
5 |
+
- world model
|
6 |
+
- continuous control
|
7 |
+
- robotics
|
8 |
---
|
9 |
+
|
10 |
+
# Model Card for TD-MPC2
|
11 |
+
|
12 |
+
Official release of TD-MPC2 model checkpoints for the paper
|
13 |
+
|
14 |
+
[Scalable, Robust World Models for Continuous Control](https://www.tdmpc2.com) by
|
15 |
+
|
16 |
+
[Nicklas Hansen](https://nicklashansen.github.io), [Hao Su](https://cseweb.ucsd.edu/~haosu)\*, [Xiaolong Wang](https://xiaolonw.github.io)\* (UC San Diego)
|
17 |
+
|
18 |
+
**Quick links:** [[Website]](https://www.tdmpc2.com) [[Paper]](https://openreview.net/pdf?id=Oxh5CstDJU) [[OpenReview]](https://openreview.net/forum?id=Oxh5CstDJU) [[Models]](https://www.tdmpc2.com/models) [[Dataset]](https://www.tdmpc2.com/dataset)
|
19 |
+
|
20 |
+
|
21 |
+
## Model Details
|
22 |
+
|
23 |
+
We open-source a total of 324 TD-MPC2 model checkpoints, including 12 multi-task models (ranging from 1M to 317M parameters) trained on 80, 70, and 30 tasks, respectively. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models.
|
24 |
+
|
25 |
+
|
26 |
+
### Model Description
|
27 |
+
|
28 |
+
- **Developed by:** [Nicklas Hansen](https://nicklashansen.github.io) (UC San Diego)
|
29 |
+
- **Model type:** 324 single-task and multi-task TD-MPC2 checkpoints trained on tasks from DMControl, Meta-World, Maniskill2, and MyoSuite.
|
30 |
+
- **License:** MIT
|
31 |
+
|
32 |
+
### Model Sources
|
33 |
+
|
34 |
+
- **Repository:** [https://github.com/nicklashansen/tdmpc2](https://github.com/nicklashansen/tdmpc2)
|
35 |
+
- **Paper:** [https://www.tdmpc2.com](https://www.tdmpc2.com)
|
36 |
+
|
37 |
+
## Uses
|
38 |
+
|
39 |
+
As one of the first major releases of model checkpoints for reinforcement learning, use of our TD-MPC2 checkpoints is fairly open-ended. We envision that our checkpoints will be useful for researchers interested in training, finetuning, evaluating, and analyzing models on any of the 104 continuous control tasks that we release models for. However, we also expect the community to discover new use cases for these checkpoints.
|
40 |
+
|
41 |
+
### Direct Use
|
42 |
+
|
43 |
+
Model checkpoints can be loaded using the [official implementation](https://github.com/nicklashansen/tdmpc2), and then be used to reproduce our results and/or generate trajectories for any of the supported tasks.
|
44 |
+
|
45 |
+
### Out-of-Scope Use
|
46 |
+
|
47 |
+
We do not expect our model checkpoints to generalize to new (unseen) tasks as is. Such model usage will most likely require some amount of fine-tuning on target task data.
|
48 |
+
|
49 |
+
## How to Get Started with the Model
|
50 |
+
|
51 |
+
Refer to the [official implementation](https://github.com/nicklashansen/tdmpc2) for installation instructions and example usage.
|
52 |
+
|
53 |
+
## Training Details
|
54 |
+
|
55 |
+
We describe the training procedure for single-task and multi-task model checkpoints in the following.
|
56 |
+
|
57 |
+
### Training Procedure (Single-task)
|
58 |
+
|
59 |
+
Single-task checkpoints are trained using the [official implementation](https://github.com/nicklashansen/tdmpc2) with default hyperparameters. All models have 5M parameters. Most, but not all, models are trained until convergence. Refer to the individual task curves in our [paper](https://www.tdmpc2.com) for a detailed breakdown of model performance on each task.
|
60 |
+
|
61 |
+
### Training Procedure (Multi-task)
|
62 |
+
|
63 |
+
Multi-task checkpoints are trained using the [official implementation](https://github.com/nicklashansen/tdmpc2) with `batch_size=1024` and otherwise default hyperparameters. We release checkpoints trained on the 80-task and 30-task datasets provided [here](https://huggingface.co/datasets/nicklashansen/tdmpc2), as well as a 70-task dataset that is obtained by filtering the 80-task dataset based on task IDs. We release model checkpoints ranging from 1M to 317M parameters.
|
64 |
+
|
65 |
+
## Environmental Impact
|
66 |
+
|
67 |
+
Carbon emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
68 |
+
|
69 |
+
- **Hardware Type:** NVIDIA GeForce RTX 3090
|
70 |
+
- **Hours used:** Approx. 50,000
|
71 |
+
- **Provider:** Private infrastructure
|
72 |
+
- **Carbon Emitted:** Approx. 7560 kg CO2eq
|
73 |
+
|
74 |
+
## Citation
|
75 |
+
|
76 |
+
If you find our work useful, please consider citing the paper as follows:
|
77 |
+
|
78 |
+
**BibTeX:**
|
79 |
+
```
|
80 |
+
@article{Hansen2023TDMPC2,
|
81 |
+
title={TD-MPC2: Scalable, Robust World Models for Continuous Control},
|
82 |
+
author={Nicklas Hansen and Hao Su and Xiaolong Wang},
|
83 |
+
booktitle={arXiv},
|
84 |
+
url={https://www.tdmpc2.com},
|
85 |
+
year={2023}
|
86 |
+
}
|
87 |
+
```
|
88 |
+
|
89 |
+
## Contact
|
90 |
+
|
91 |
+
Correspondence to: [Nicklas Hansen](https://nicklashansen.github.io)
|