tag: | |
- LunarLander-v2 | |
- ppo | |
- deep-reinforcement-learning | |
- reinforcement-learning | |
- custom-implementation | |
- deep-rl-class | |
model-index: | |
- name: PPO | |
results: | |
- metrics: | |
- type: mean_reward | |
value: -121.77 +/- 30.58 | |
name: mean_reward | |
task: | |
type: reinforcement-learning | |
name: reinforcement-learning | |
dataset: | |
name: LunarLander-v2 | |
type: LunarLander-v2 | |
# 使用PPO智能体来玩 LunarLander-v2 | |
这是一个使用PPO训练有素的模型玩 LunarLander-v2. | |
要学习编写你自己的PPO智能体并训练它, | |
请查阅深度强化学习课程第8单元: https://github.com/huggingface/deep-rl-class/tree/main/unit8 | |
# 超参数 | |
```python | |
{'exp_name': 'ppo' | |
'seed': 1 | |
'torch_deterministic': True | |
'cuda': True | |
'track': False | |
'wandb_project_name': 'cleanRL' | |
'wandb_entity': None | |
'capture_video': False | |
'env_id': 'LunarLander-v2' | |
'total_timesteps': 50000 | |
'learning_rate': 0.00025 | |
'num_envs': 4 | |
'num_steps': 128 | |
'anneal_lr': True | |
'gae': True | |
'gamma': 0.99 | |
'gae_lambda': 0.95 | |
'num_minibatches': 4 | |
'update_epochs': 4 | |
'norm_adv': True | |
'clip_coef': 0.2 | |
'clip_vloss': True | |
'ent_coef': 0.01 | |
'vf_coef': 0.5 | |
'max_grad_norm': 0.5 | |
'target_kl': None | |
'repo_id': 'sun1638650145/PyTorch-PPO-LunarLander-v2' | |
'batch_size': 512 | |
'minibatch_size': 128} | |
``` | |