zjowowen commited on
Commit
40c6c8a
1 Parent(s): 4a6b658

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +292 -0
README.md ADDED
@@ -0,0 +1,292 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: pytorch
5
+ tags:
6
+ - deep-reinforcement-learning
7
+ - reinforcement-learning
8
+ - DI-engine
9
+ - LunarLander-v2
10
+ benchmark_name: OpenAI/Gym/Box2d
11
+ task_name: LunarLander-v2
12
+ pipeline_tag: reinforcement-learning
13
+ model-index:
14
+ - name: TD3
15
+ results:
16
+ - task:
17
+ type: reinforcement-learning
18
+ name: reinforcement-learning
19
+ dataset:
20
+ name: OpenAI/Gym/Box2d-LunarLander-v2
21
+ type: OpenAI/Gym/Box2d-LunarLander-v2
22
+ metrics:
23
+ - type: mean_reward
24
+ value: 244.37 +/- 3.77
25
+ name: mean_reward
26
+ ---
27
+
28
+ # Play **LunarLander-v2** with **TD3** Policy
29
+
30
+ ## Model Description
31
+ <!-- Provide a longer summary of what this model is. -->
32
+ This is a simple **TD3** implementation to OpenAI/Gym/Box2d **LunarLander-v2** using the [DI-engine library](https://github.com/opendilab/di-engine) and the [DI-zoo](https://github.com/opendilab/DI-engine/tree/main/dizoo).
33
+
34
+ **DI-engine** is a python library for solving general decision intelligence problems, which is based on implementations of reinforcement learning framework using PyTorch or JAX. This library aims to standardize the reinforcement learning framework across different algorithms, benchmarks, environments, and to support both academic researches and prototype applications. Besides, self-customized training pipelines and applications are supported by reusing different abstraction levels of DI-engine reinforcement learning framework.
35
+
36
+
37
+
38
+ ## Model Usage
39
+ ### Install the Dependencies
40
+ <details close>
41
+ <summary>(Click for Details)</summary>
42
+
43
+ ```shell
44
+ # install huggingface_ding
45
+ git clone https://github.com/opendilab/huggingface_ding.git
46
+ pip3 install -e ./huggingface_ding/
47
+ # install environment dependencies if needed
48
+ pip3 install DI-engine[common_env]
49
+ ```
50
+ </details>
51
+
52
+ ### Git Clone from Huggingface and Run the Model
53
+
54
+ <details close>
55
+ <summary>(Click for Details)</summary>
56
+
57
+ ```shell
58
+ # running with trained model
59
+ python3 -u run.py
60
+ ```
61
+ **run.py**
62
+ ```python
63
+ from ding.bonus import TD3Agent
64
+ from ding.config import Config
65
+ from easydict import EasyDict
66
+ import torch
67
+
68
+ # Pull model from files which are git cloned from huggingface
69
+ policy_state_dict = torch.load("pytorch_model.bin", map_location=torch.device("cpu"))
70
+ cfg = EasyDict(Config.file_to_dict("policy_config.py"))
71
+ # Instantiate the agent
72
+ agent = TD3Agent(
73
+ env="lunarlander_continuous",
74
+ exp_name="LunarLander-v2-TD3",
75
+ cfg=cfg.exp_config,
76
+ policy_state_dict=policy_state_dict
77
+ )
78
+ # Continue training
79
+ agent.train(step=5000)
80
+ # Render the new agent performance
81
+ agent.deploy(enable_save_replay=True)
82
+
83
+ ```
84
+ </details>
85
+
86
+ ### Run Model by Using Huggingface_ding
87
+
88
+ <details close>
89
+ <summary>(Click for Details)</summary>
90
+
91
+ ```shell
92
+ # running with trained model
93
+ python3 -u run.py
94
+ ```
95
+ **run.py**
96
+ ```python
97
+ from ding.bonus import TD3Agent
98
+ from huggingface_ding import pull_model_from_hub
99
+
100
+ # Pull model from Hugggingface hub
101
+ policy_state_dict, cfg = pull_model_from_hub(repo_id="OpenDILabCommunity/LunarLander-v2-TD3")
102
+ # Instantiate the agent
103
+ agent = TD3Agent(
104
+ env="lunarlander_continuous",
105
+ exp_name="LunarLander-v2-TD3",
106
+ cfg=cfg.exp_config,
107
+ policy_state_dict=policy_state_dict
108
+ )
109
+ # Continue training
110
+ agent.train(step=5000)
111
+ # Render the new agent performance
112
+ agent.deploy(enable_save_replay=True)
113
+
114
+ ```
115
+ </details>
116
+
117
+ ## Model Training
118
+
119
+ ### Train the Model and Push to Huggingface_hub
120
+
121
+ <details close>
122
+ <summary>(Click for Details)</summary>
123
+
124
+ ```shell
125
+ #Training Your Own Agent
126
+ python3 -u train.py
127
+ ```
128
+ **train.py**
129
+ ```python
130
+ from ding.bonus import TD3Agent
131
+ from huggingface_ding import push_model_to_hub
132
+
133
+ # Instantiate the agent
134
+ agent = TD3Agent("lunarlander_continuous", exp_name="LunarLander-v2-TD3")
135
+ # Train the agent
136
+ return_ = agent.train(step=int(4000000), collector_env_num=4, evaluator_env_num=4)
137
+ # Push model to huggingface hub
138
+ push_model_to_hub(
139
+ agent=agent.best,
140
+ env_name="OpenAI/Gym/Box2d",
141
+ task_name="LunarLander-v2",
142
+ algo_name="TD3",
143
+ wandb_url=return_.wandb_url,
144
+ github_repo_url="https://github.com/opendilab/DI-engine",
145
+ github_doc_model_url="https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html",
146
+ github_doc_env_url="https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html",
147
+ installation_guide="pip3 install DI-engine[common_env]",
148
+ usage_file_by_git_clone="./td3/lunarlander_td3_deploy.py",
149
+ usage_file_by_huggingface_ding="./td3/lunarlander_td3_download.py",
150
+ train_file="./td3/lunarlander_td3.py",
151
+ repo_id="OpenDILabCommunity/LunarLander-v2-TD3"
152
+ )
153
+
154
+ ```
155
+ </details>
156
+
157
+ **Configuration**
158
+ <details close>
159
+ <summary>(Click for Details)</summary>
160
+
161
+
162
+ ```python
163
+ exp_config = {
164
+ 'env': {
165
+ 'manager': {
166
+ 'episode_num': float("inf"),
167
+ 'max_retry': 1,
168
+ 'retry_type': 'reset',
169
+ 'auto_reset': True,
170
+ 'step_timeout': None,
171
+ 'reset_timeout': None,
172
+ 'retry_waiting_time': 0.1,
173
+ 'cfg_type': 'BaseEnvManagerDict'
174
+ },
175
+ 'stop_value': 240,
176
+ 'env_id': 'LunarLanderContinuous-v2',
177
+ 'collector_env_num': 4,
178
+ 'evaluator_env_num': 8,
179
+ 'n_evaluator_episode': 8,
180
+ 'act_scale': True
181
+ },
182
+ 'policy': {
183
+ 'model': {
184
+ 'twin_critic': True,
185
+ 'obs_shape': 8,
186
+ 'action_shape': 2,
187
+ 'action_space': 'regression'
188
+ },
189
+ 'learn': {
190
+ 'learner': {
191
+ 'train_iterations': 1000000000,
192
+ 'dataloader': {
193
+ 'num_workers': 0
194
+ },
195
+ 'log_policy': True,
196
+ 'hook': {
197
+ 'load_ckpt_before_run': '',
198
+ 'log_show_after_iter': 100,
199
+ 'save_ckpt_after_iter': 10000,
200
+ 'save_ckpt_after_run': True
201
+ },
202
+ 'cfg_type': 'BaseLearnerDict'
203
+ },
204
+ 'update_per_collect': 256,
205
+ 'batch_size': 256,
206
+ 'learning_rate_actor': 0.0003,
207
+ 'learning_rate_critic': 0.001,
208
+ 'ignore_done': False,
209
+ 'target_theta': 0.005,
210
+ 'discount_factor': 0.99,
211
+ 'actor_update_freq': 2,
212
+ 'noise': True,
213
+ 'noise_sigma': 0.1,
214
+ 'noise_range': {
215
+ 'min': -0.5,
216
+ 'max': 0.5
217
+ }
218
+ },
219
+ 'collect': {
220
+ 'collector': {},
221
+ 'unroll_len': 1,
222
+ 'noise_sigma': 0.1,
223
+ 'n_sample': 256
224
+ },
225
+ 'eval': {
226
+ 'evaluator': {
227
+ 'eval_freq': 1000,
228
+ 'render': {
229
+ 'render_freq': -1,
230
+ 'mode': 'train_iter'
231
+ },
232
+ 'cfg_type': 'InteractionSerialEvaluatorDict',
233
+ 'n_episode': 8,
234
+ 'stop_value': 240
235
+ }
236
+ },
237
+ 'other': {
238
+ 'replay_buffer': {
239
+ 'replay_buffer_size': 100000
240
+ }
241
+ },
242
+ 'on_policy': False,
243
+ 'cuda': True,
244
+ 'multi_gpu': False,
245
+ 'bp_update_sync': True,
246
+ 'traj_len_inf': False,
247
+ 'type': 'td3',
248
+ 'priority': False,
249
+ 'priority_IS_weight': False,
250
+ 'random_collect_size': 10000,
251
+ 'transition_with_policy_data': False,
252
+ 'action_space': 'continuous',
253
+ 'reward_batch_norm': False,
254
+ 'multi_agent': False,
255
+ 'cfg_type': 'TD3PolicyDict'
256
+ },
257
+ 'exp_name': 'LunarLander-v2-TD3',
258
+ 'seed': 0,
259
+ 'wandb_logger': {
260
+ 'gradient_logger': True,
261
+ 'video_logger': True,
262
+ 'plot_logger': True,
263
+ 'action_logger': True,
264
+ 'return_logger': False
265
+ }
266
+ }
267
+
268
+ ```
269
+ </details>
270
+
271
+ **Training Procedure**
272
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
273
+ - **Weights & Biases (wandb):** [monitor link](https://wandb.ai/zjowowen/LunarLander-v2-TD3)
274
+
275
+ ## Model Information
276
+ <!-- Provide the basic links for the model. -->
277
+ - **Github Repository:** [repo link](https://github.com/opendilab/DI-engine)
278
+ - **Doc**: [DI-engine-docs Algorithm link](https://di-engine-docs.readthedocs.io/en/latest/12_policies/td3.html)
279
+ - **Configuration:** [config link](https://huggingface.co/OpenDILabCommunity/LunarLander-v2-TD3/blob/main/policy_config.py)
280
+ - **Demo:** [video](https://huggingface.co/OpenDILabCommunity/LunarLander-v2-TD3/blob/main/replay.mp4)
281
+ <!-- Provide the size information for the model. -->
282
+ - **Parameters total size:** 57.52 KB
283
+ - **Last Update Date:** 2023-04-17
284
+
285
+ ## Environments
286
+ <!-- Address questions around what environment the model is intended to be trained and deployed at, including the necessary information needed to be provided for future users. -->
287
+ - **Benchmark:** OpenAI/Gym/Box2d
288
+ - **Task:** LunarLander-v2
289
+ - **Gym version:** 0.25.1
290
+ - **DI-engine version:** v0.4.7
291
+ - **PyTorch version:** 1.7.1
292
+ - **Doc**: [DI-engine-docs Environments link](https://di-engine-docs.readthedocs.io/en/latest/13_envs/lunarlander.html)