nicklashansen commited on
Commit
38f2a4d
1 Parent(s): 21042a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,3 +1,91 @@
1
  ---
2
  license: mit
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - reinforcement learning
5
+ - world model
6
+ - continuous control
7
+ - robotics
8
  ---
9
+
10
+ # Model Card for TD-MPC2
11
+
12
+ Official release of TD-MPC2 model checkpoints for the paper
13
+
14
+ [Scalable, Robust World Models for Continuous Control](https://www.tdmpc2.com) by
15
+
16
+ [Nicklas Hansen](https://nicklashansen.github.io), [Hao Su](https://cseweb.ucsd.edu/~haosu)\*, [Xiaolong Wang](https://xiaolonw.github.io)\* (UC San Diego)
17
+
18
+ **Quick links:** [[Website]](https://www.tdmpc2.com) [[Paper]](https://openreview.net/pdf?id=Oxh5CstDJU) [[OpenReview]](https://openreview.net/forum?id=Oxh5CstDJU) [[Models]](https://www.tdmpc2.com/models) [[Dataset]](https://www.tdmpc2.com/dataset)
19
+
20
+
21
+ ## Model Details
22
+
23
+ We open-source a total of 324 TD-MPC2 model checkpoints, including 12 multi-task models (ranging from 1M to 317M parameters) trained on 80, 70, and 30 tasks, respectively. We are excited to see what the community will do with these models, and hope that our release will encourage other research labs to open-source their checkpoints as well. This section aims to provide further details about the released models.
24
+
25
+
26
+ ### Model Description
27
+
28
+ - **Developed by:** [Nicklas Hansen](https://nicklashansen.github.io) (UC San Diego)
29
+ - **Model type:** 324 single-task and multi-task TD-MPC2 checkpoints trained on tasks from DMControl, Meta-World, Maniskill2, and MyoSuite.
30
+ - **License:** MIT
31
+
32
+ ### Model Sources
33
+
34
+ - **Repository:** [https://github.com/nicklashansen/tdmpc2](https://github.com/nicklashansen/tdmpc2)
35
+ - **Paper:** [https://www.tdmpc2.com](https://www.tdmpc2.com)
36
+
37
+ ## Uses
38
+
39
+ As one of the first major releases of model checkpoints for reinforcement learning, use of our TD-MPC2 checkpoints is fairly open-ended. We envision that our checkpoints will be useful for researchers interested in training, finetuning, evaluating, and analyzing models on any of the 104 continuous control tasks that we release models for. However, we also expect the community to discover new use cases for these checkpoints.
40
+
41
+ ### Direct Use
42
+
43
+ Model checkpoints can be loaded using the [official implementation](https://github.com/nicklashansen/tdmpc2), and then be used to reproduce our results and/or generate trajectories for any of the supported tasks.
44
+
45
+ ### Out-of-Scope Use
46
+
47
+ We do not expect our model checkpoints to generalize to new (unseen) tasks as is. Such model usage will most likely require some amount of fine-tuning on target task data.
48
+
49
+ ## How to Get Started with the Model
50
+
51
+ Refer to the [official implementation](https://github.com/nicklashansen/tdmpc2) for installation instructions and example usage.
52
+
53
+ ## Training Details
54
+
55
+ We describe the training procedure for single-task and multi-task model checkpoints in the following.
56
+
57
+ ### Training Procedure (Single-task)
58
+
59
+ Single-task checkpoints are trained using the [official implementation](https://github.com/nicklashansen/tdmpc2) with default hyperparameters. All models have 5M parameters. Most, but not all, models are trained until convergence. Refer to the individual task curves in our [paper](https://www.tdmpc2.com) for a detailed breakdown of model performance on each task.
60
+
61
+ ### Training Procedure (Multi-task)
62
+
63
+ Multi-task checkpoints are trained using the [official implementation](https://github.com/nicklashansen/tdmpc2) with `batch_size=1024` and otherwise default hyperparameters. We release checkpoints trained on the 80-task and 30-task datasets provided [here](https://huggingface.co/datasets/nicklashansen/tdmpc2), as well as a 70-task dataset that is obtained by filtering the 80-task dataset based on task IDs. We release model checkpoints ranging from 1M to 317M parameters.
64
+
65
+ ## Environmental Impact
66
+
67
+ Carbon emissions are estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
68
+
69
+ - **Hardware Type:** NVIDIA GeForce RTX 3090
70
+ - **Hours used:** Approx. 50,000
71
+ - **Provider:** Private infrastructure
72
+ - **Carbon Emitted:** Approx. 7560 kg CO2eq
73
+
74
+ ## Citation
75
+
76
+ If you find our work useful, please consider citing the paper as follows:
77
+
78
+ **BibTeX:**
79
+ ```
80
+ @article{Hansen2023TDMPC2,
81
+ title={TD-MPC2: Scalable, Robust World Models for Continuous Control},
82
+ author={Nicklas Hansen and Hao Su and Xiaolong Wang},
83
+ booktitle={arXiv},
84
+ url={https://www.tdmpc2.com},
85
+ year={2023}
86
+ }
87
+ ```
88
+
89
+ ## Contact
90
+
91
+ Correspondence to: [Nicklas Hansen](https://nicklashansen.github.io)