|
2023.11.06(v0.5.0) |
|
- env: add tabmwp env ( |
|
- env: polish anytrading env issues ( |
|
- algo: add PromptPG algorithm ( |
|
- algo: add Plan Diffuser algorithm ( |
|
- algo: add new pipeline implementation of IMPALA algorithm ( |
|
- algo: add dropout layers to DQN-style algorithms ( |
|
- feature: add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support ( |
|
- feature: add more unittest cases for model ( |
|
- feature: add collector logging in new pipeline ( |
|
- fix: logger middleware problems ( |
|
- fix: ppo parallel bug ( |
|
- fix: typo in optimizer_helper.py ( |
|
- fix: mlp dropout if condition bug |
|
- fix: drex collecting data unittest bugs |
|
- style: polish env manager/wrapper comments and API doc ( |
|
- style: polish model comments and API doc ( |
|
- style: polish policy comments and API doc ( |
|
- style: polish rl_utils comments and API doc ( |
|
- style: polish torch_utils comments and API doc ( |
|
- style: update README.md and Colab demo ( |
|
- style: update metaworld docker image |
|
|
|
2023.08.23(v0.4.9) |
|
- env: add cliffwalking env ( |
|
- env: add lunarlander ppo config and example |
|
- algo: add BCQ offline RL algorithm ( |
|
- algo: add Dreamerv3 model-based RL algorithm ( |
|
- algo: add tensor stream merge network tools ( |
|
- algo: add scatter connection model ( |
|
- algo: refactor Decision Transformer in new pipeline and support img input and discrete output ( |
|
- algo: add three variants of Bilinear classes and a FiLM class ( |
|
- feature: polish offpolicy RL multi-gpu DDP training ( |
|
- feature: add middleware for Ape-X distributed pipeline ( |
|
- feature: add example for evaluating trained DQN ( |
|
- fix: to_ndarray fails to assign dtype for scalars ( |
|
- fix: evaluator return episode_info compatibility bug |
|
- fix: cql example entry wrong config bug |
|
- fix: enable_save_figure env interface |
|
- fix: redundant env info bug in evaluator |
|
- fix: to_item unittest bug |
|
- style: polish and simplify requirements ( |
|
- style: add Hugging Face Model Zoo badge ( |
|
- style: add openxlab Model Zoo badge ( |
|
- style: fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 ( |
|
- style: fix mujoco-py compatibility issue for cython<3 ( |
|
- style: fix type spell error ( |
|
- style: fix pypi release actions ubuntu 18.04 bug |
|
- style: update contact information (e.g. wechat) |
|
- style: polish algorithm doc tables |
|
|
|
2023.05.25(v0.4.8) |
|
- env: fix gym hybrid reward dtype bug ( |
|
- env: fix atari env id noframeskip bug ( |
|
- env: fix typo in gym any_trading env ( |
|
- env: update td3bc d4rl config ( |
|
- env: polish bipedalwalker config |
|
- algo: add EDAC offline RL algorithm ( |
|
- algo: add LN and GN norm_type support in ResBlock ( |
|
- algo: add normal value norm baseline for PPOF ( |
|
- algo: polish last layer init/norm in MLP ( |
|
- algo: polish TD3 monitor variable |
|
- feature: add MAPPO/MASAC task example ( |
|
- feature: add PPO example for complex env observation ( |
|
- feature: add barrier middleware ( |
|
- fix: abnormal collector log and add record_random_collect option ( |
|
- fix: to_item compatibility bug ( |
|
- fix: trainer dtype transform compatibility bug |
|
- fix: pettingzoo 1.23.0 compatibility bug |
|
- fix: ensemble head unittest bug |
|
- style: fix incompatible gym version bug in Dockerfile.env ( |
|
- style: add more algorithm docs |
|
|
|
2023.04.11(v0.4.7) |
|
- env: add dmc2gym env support and baseline ( |
|
- env: update pettingzoo to the latest version ( |
|
- env: polish icm/rnd+onppo config bugs and add app_door_to_key env ( |
|
- env: add lunarlander continuous TD3/SAC config |
|
- env: polish lunarlander discrete C51 config |
|
- algo: add Procedure Cloning (PC) imitation learning algorithm ( |
|
- algo: add Munchausen Reinforcement Learning (MDQN) algorithm ( |
|
- algo: add reward/value norm methods: popart & value rescale & symlog ( |
|
- algo: polish reward model config and training pipeline ( |
|
- algo: add PPOF reward space demo support ( |
|
- algo: add PPOF Atari demo support ( |
|
- algo: polish dqn default config and env examples ( |
|
- algo: polish comment and clean code about SAC |
|
- feature: add language model (e.g. GPT) training utils ( |
|
- feature: remove policy cfg sub fields requirements ( |
|
- feature: add full wandb support ( |
|
- fix: confusing shallow copy operation about next_obs ( |
|
- fix: unsqueeze action_args in PDQN when shape is 1 ( |
|
- fix: evaluator return_info tensor type bug ( |
|
- fix: deque buffer wrapper PER bug ( |
|
- fix: reward model save method compatibility bug |
|
- fix: logger assertion and unittest bug |
|
- fix: bfs test py3.9 compatibility bug |
|
- fix: zergling collector unittest bug |
|
- style: add DI-engine torch-rpc p2p communication docker ( |
|
- style: add D4RL docker ( |
|
- style: correct typo in task ( |
|
- style: correct typo in time_helper ( |
|
- style: polish readme and add treetensor example |
|
- style: update contributing doc |
|
|
|
2023.02.16(v0.4.6) |
|
- env: add metadrive env and related ppo config ( |
|
- env: add acrobot env and related dqn config ( |
|
- env: add carracing in box2d ( |
|
- env: add new gym hybrid viz ( |
|
- env: update cartpole IL config ( |
|
- algo: add BDQ algorithm ( |
|
- algo: add procedure cloning model ( |
|
- feature: add simplified PPOF (PPO × Family) interface ( |
|
- fix: to_device and prev_state bug when using ttorch ( |
|
- fix: py38 and numpy unittest bugs ( |
|
- fix: typo in contrastive_loss.py ( |
|
- fix: dizoo envs pkg installation bugs |
|
- fix: multi_trainer middleware unittest bug |
|
- style: add evogym docker ( |
|
- style: fix metaworld docker bug |
|
- style: fix setuptools high version incompatibility bug |
|
- style: extend treetensor lowest version |
|
|
|
2022.12.13(v0.4.5) |
|
- env: add beergame supply chain optimization env ( |
|
- env: add env gym_pybullet_drones ( |
|
- env: rename eval reward to episode return ( |
|
- algo: add policy gradient algo implementation ( |
|
- algo: add MADDPG algo implementation ( |
|
- algo: add IMPALA continuous algo implementation ( |
|
- algo: add MADQN algo implementation ( |
|
- feature: add new task IMPALA-type distributed training scheme ( |
|
- feature: add load and save method for replaybuffer ( |
|
- feature: add more DingEnvWrapper example ( |
|
- feature: add evaluator more info viz support ( |
|
- feature: add trackback log for subprocess env manager ( |
|
- fix: halfcheetah td3 config file ( |
|
- fix: mujoco action_clip args compatibility bug ( |
|
- fix: atari a2c config entry bug |
|
- fix: drex unittest compatibility bug |
|
- style: add Roadmap issue of DI-engine ( |
|
- style: update related project link and new env doc |
|
|
|
2022.10.31(v0.4.4) |
|
- env: add modified gym-hybrid including moving, sliding and hardmove ( |
|
- env: add evogym support ( |
|
- env: add save_replay_gif option ( |
|
- env: adapt minigrid_env and related config to latest MiniGrid v2.0.0 ( |
|
- algo: add pcgrad optimizer ( |
|
- algo: add some features in MLP and ResBlock ( |
|
- algo: delete mcts related modules ( |
|
- feature: add wandb middleware and demo ( |
|
- feature: add new properties in Context ( |
|
- feature: add single env policy wrapper for policy deployment |
|
- feature: add custom model demo and doc |
|
- fix: build logger args and unittests ( |
|
- fix: total_loss calculation in PDQN ( |
|
- fix: save gif function bug |
|
- fix: level sample unittest bug |
|
- style: update contact email address ( |
|
- style: polish env log and resblock name |
|
- style: add details button in readme |
|
|
|
2022.09.23(v0.4.3) |
|
- env: add rule-based gomoku expert ( |
|
- algo: fix a2c policy batch size bug ( |
|
- algo: enable activation option in collaq attention and mixer |
|
- algo: minor fix about IBC ( |
|
- feature: add IGM support ( |
|
- feature: add tb logger middleware and demo |
|
- fix: the type conversion in ding_env_wrapper ( |
|
- fix: di-orchestrator version bug in unittest ( |
|
- fix: data collection errors caused by shallow copies ( |
|
- fix: gym==0.26.0 seed args bug |
|
- style: add readme tutorial link(environment & algorithm) ( |
|
- style: adjust location of the default_model method in policy ( |
|
|
|
2022.09.08(v0.4.2) |
|
- env: add rocket env ( |
|
- env: updated pettingzoo env and improved related performance ( |
|
- env: add mario env demo ( |
|
- env: add MAPPO multi-agent config ( |
|
- env: add mountain car (discrete action) environment ( |
|
- env: fix multi-agent mujoco gym comaptibility bug |
|
- env: fix gfootball env save_replay variable init bug |
|
- algo: add IBC (Implicit Behaviour Cloning) algorithm ( |
|
- algo: add BCO (Behaviour Cloning from Observation) algorithm ( |
|
- algo: add continuous PPOPG algorithm ( |
|
- algo: add PER in CollaQ ( |
|
- algo: add activation option in QMIX and CollaQ |
|
- feature: update ctx to dataclass ( |
|
- fix: base_env FinalMeta bug about gym 0.25.0-0.25.1 |
|
- fix: config inplace modification bug |
|
- fix: ding cli no argument problem |
|
- fix: import errors after running setup.py (jinja2, markupsafe) |
|
- fix: conda py3.6 and cross platform build bug |
|
- style: add project state and datetime in log dir ( |
|
- style: polish notes for q-learning model ( |
|
- style: revision to mujoco dockerfile and validation ( |
|
- style: add dockerfile for cityflow env |
|
- style: polish default output log format |
|
|
|
2022.08.12(v0.4.1) |
|
- env: add gym trading env ( |
|
- env: add board games env (tictactoe, gomuku, chess) ( |
|
- env: add sokoban env ( |
|
- env: add BC and DQN demo for gfootball ( |
|
- env: add discrete pendulum env ( |
|
- algo: add STEVE model-based algorithm ( |
|
- algo: add PLR algorithm ( |
|
- algo: plugin ST-DIM in PPO ( |
|
- feature: add final result saving in training pipeline |
|
- fix: random policy randomness bug |
|
- fix: action_space seed compalbility bug |
|
- fix: discard message sent by self in redis mq ( |
|
- fix: remove pace controller ( |
|
- fix: import error in serial_pipeline_trex ( |
|
- fix: unittest hang and fail bug ( |
|
- fix: DREX collect data unittest bug |
|
- fix: remove unused import cv2 |
|
- fix: ding CLI env/policy option bug |
|
- style: upgrade Python version from 3.6-3.8 to 3.7-3.9 |
|
- style: upgrade gym version from 0.20.0 to 0.25.0 |
|
- style: upgrade torch version from 1.10.0 to 1.12.0 |
|
- style: upgrade mujoco bin from 2.0.0 to 2.1.0 |
|
- style: add buffer api description ( |
|
- style: polish VAE comments ( |
|
- style: unittest for FQF ( |
|
- style: add metaworld dockerfile ( |
|
- style: remove opencv requirement in default setting |
|
- style: update long description in setup.py |
|
|
|
2022.06.21(v0.4.0) |
|
- env: add MAPPO/MASAC all configs in SMAC ( |
|
- env: add dmc2gym env ( |
|
- env: remove DI-star requirements of dizoo/smac, use official pysc2 ( |
|
- env: add latest GAIL mujoco config ( |
|
- env: polish procgen env ( |
|
- env: add MBPO ant and humanoid config for mbpo ( |
|
- env: fix slime volley env obs space bug when agent_vs_agent |
|
- env: fix smac env obs space bug |
|
- env: fix import path error in lunarlander ( |
|
- algo: add Decision Transformer algorithm ( |
|
- algo: add on-policy PPG algorithm ( |
|
- algo: add DDPPO & add model-based SAC with lambda-return algorithm ( |
|
- algo: add infoNCE loss and ST-DIM algorithm ( |
|
- algo: add FQF distributional RL algorithm ( |
|
- algo: add continuous BC algorithm ( |
|
- algo: add pure policy gradient PPO algorithm ( |
|
- algo: add SQIL + SAC algorithm ( |
|
- algo: polish NGU and related modules ( |
|
- algo: add marl distributional td loss ( |
|
- feature: add new worker middleware ( |
|
- feature: refactor model-based RL pipeline (ding/world_model) ( |
|
- feature: refactor logging system in the whole DI-engine ( |
|
- feature: add env supervisor design ( |
|
- feature: support async reset for envpool env manager ( |
|
- feature: add log videos to tensorboard ( |
|
- feature: refactor impala cnn encoder interface ( |
|
- fix: env save replay bug |
|
- fix: transformer mask inplace operation bug |
|
- fix: transtion_with_policy_data bug in SAC and PPG |
|
- style: add dockerfile for ding:hpc image ( |
|
- style: fix mpire 2.3.5 which handles default processes more elegantly ( |
|
- style: use FORMAT_DIR instead of ./ding ( |
|
- style: update quickstart colab link ( |
|
- style: polish comments in ding/model/common ( |
|
- style: update mujoco docker download path ( |
|
- style: fix protobuf new version compatibility bug |
|
- style: fix torch1.8.0 torch.div compatibility bug |
|
- style: update doc links in readme |
|
- style: add outline in readme and update wechat image |
|
- style: update head image and refactor docker dir |
|
|
|
2022.04.23(v0.3.1) |
|
- env: polish and standardize dizoo config ( |
|
- env: add GRF academic env and config ( |
|
- env: update env inferface of GRF ( |
|
- env: update D4RL offline RL env and config ( |
|
- env: polish PomdpAtariEnv ( |
|
- algo: DREX algorithm ( |
|
- feature: separate mq and parallel modules, add redis ( |
|
- feature: rename env variables; fix attach_to parameter ( |
|
- feature: env implementation check ( |
|
- feature: adjust and set the max column number of tabulate in log ( |
|
- feature: add drop_extra option for sample collect |
|
- feature: speed up GTrXL forward method + GRU unittest ( |
|
- fix: add act_scale in DingEnvWrapper; fix envpool env manager ( |
|
- fix: auto_reset=False and env_ref bug in env manager ( |
|
- fix: data type and deepcopy bug in RND ( |
|
- fix: share_memory bug and multi_mujoco env ( |
|
- fix: some bugs in GTrXL ( |
|
- fix: update gym_vector_env_manager and add more unittest ( |
|
- fix: mdpolicy random collect bug ( |
|
- fix: gym.wrapper save video replay bug |
|
- fix: collect abnormal step format bug and add unittest |
|
- test: add buffer benchmark & socket test ( |
|
- style: upgrade mpire ( |
|
- style: add GRF(google research football) docker ( |
|
- style: update policy and gail comment |
|
|
|
2022.03.24(v0.3.0) |
|
- env: add bitfilp HER DQN benchmark ( |
|
- env: slime volley league training demo ( |
|
- algo: Gated TransformXL (GTrXL) algorithm ( |
|
- algo: TD3 + VAE(HyAR) latent action algorithm ( |
|
- algo: stochastic dueling network ( |
|
- algo: use log prob instead of using prob in ACER ( |
|
- feature: support envpool env manager ( |
|
- feature: add league main and other improvements in new framework ( |
|
- feature: add pace controller middleware in new framework ( |
|
- feature: add auto recover option in new framework ( |
|
- feature: add k8s parser in new framework ( |
|
- feature: support async event handler and logger ( |
|
- feautre: add grad norm calculator ( |
|
- feautre: add gym vector env manager ( |
|
- feautre: add train_iter and env_step in serial pipeline ( |
|
- feautre: add rich logger handler ( |
|
- feature: add naive lr_scheduler demo |
|
- refactor: new BaseEnv and DingEnvWrapper ( |
|
- polish: MAPPO and MASAC smac config ( |
|
- polish: QMIX smac config ( |
|
- polish: R2D2 atari config ( |
|
- polish: A2C atari config ( |
|
- polish: GAIL box2d and mujoco config ( |
|
- polish: ACER atari config ( |
|
- polish: SQIL atari config ( |
|
- polish: TREX atari/mujoco config |
|
- polish: IMPALA atari config |
|
- polish: MBPO/D4PG mujoco config |
|
- fix: random_collect compatible to episode collector ( |
|
- fix: remove default n_sample/n_episode value in policy config ( |
|
- fix: PDQN model bug on gpu device ( |
|
- fix: TREX algorithm CLI bug ( |
|
- fix: DQfD JE computation bug and move to AdamW optimizer ( |
|
- fix: pytest problem for parallel middleware ( |
|
- fix: mujoco numpy compatibility bug |
|
- fix: markupsafe 2.1.0 bug |
|
- fix: framework parallel module network emit bug |
|
- fix: mpire bug and disable algotest in py3.8 |
|
- fix: lunarlander env import and env_id bug |
|
- fix: icm unittest repeat name bug |
|
- fix: buffer thruput close bug |
|
- test: resnet unittest ( |
|
- test: SAC/SQN unittest ( |
|
- test: CQL/R2D3/GAIL unittest ( |
|
- test: NGU td unittest ( |
|
- test: model wrapper unittest ( |
|
- test: MAQAC model unittest ( |
|
- style: add doc docker ( |
|
|
|
2022.01.01(v0.2.3) |
|
- env: add multi-agent mujoco env ( |
|
- env: add delay reward mujoco env ( |
|
- env: fix port conflict in gym_soccer ( |
|
- algo: MASAC algorithm ( |
|
- algo: TREX algorithm ( |
|
- algo: H-PPO hybrid action space algorithm ( |
|
- algo: residual link in R2D2 ( |
|
- algo: gumbel softmax ( |
|
- algo: move actor_head_type to action_space field |
|
- feature: new main pipeline and async/parallel framework ( |
|
- feature: refactor buffer, separate algorithm and storage ( |
|
- feature: cli in new pipeline(ditask) ( |
|
- feature: add multiprocess tblogger, fix circular reference problem ( |
|
- feature: add multiple seed cli |
|
- feature: polish eps_greedy_multinomial_sample in model_wrapper ( |
|
- fix: R2D3 abs priority problem ( |
|
- fix: multi-discrete action space policies random action bug ( |
|
- fix: doc generate bug with enum_tools ( |
|
- style: more comments about R2D2 ( |
|
- style: add doc about how to migrate a new env |
|
- style: add doc about env tutorial in dizoo |
|
- style: add conda auto release ( |
|
- style: udpate zh doc link |
|
- style: update kaggle tutorial link |
|
|
|
2021.12.03(v0.2.2) |
|
- env: apple key to door treasure env ( |
|
- env: add bsuite memory benchmark ( |
|
- env: polish atari impala config |
|
- algo: Guided Cost IRL algorithm ( |
|
- algo: ICM exploration algorithm ( |
|
- algo: MP-DQN hybrid action space algorithm ( |
|
- algo: add loss statistics and polish r2d3 pong config ( |
|
- feautre: add renew env mechanism in env manager and update timeout mechanism ( |
|
- fix: async subprocess env manager reset bug ( |
|
- fix: keepdims name bug in model wrapper |
|
- fix: on-policy ppo value norm bug |
|
- fix: GAE and RND unittest bug |
|
- fix: hidden state wrapper h tensor compatiblity |
|
- fix: naive buffer auto config create bug |
|
- style: add supporters list |
|
|
|
2021.11.22(v0.2.1) |
|
- env: gym-hybrid env ( |
|
- env: gym-soccer (HFO) env ( |
|
- env: Go-Bigger env baseline ( |
|
- env: add the bipedalwalker config of sac and ppo ( |
|
- algo: DQfD Imitation Learning algorithm ( |
|
- algo: TD3BC offline RL algorithm ( |
|
- algo: MBPO model-based RL algorithm ( |
|
- algo: PADDPG hybrid action space algorithm ( |
|
- algo: PDQN hybrid action space algorithm ( |
|
- algo: fix R2D2 bugs and produce benchmark, add naive NGU ( |
|
- algo: self-play training demo in slime_volley env ( |
|
- algo: add example of GAIL entry + config for mujoco ( |
|
- feature: enable arbitrary policy num in serial sample collector |
|
- feautre: add torch DataParallel for single machine multi-GPU |
|
- feature: add registry force_overwrite argument |
|
- feature: add naive buffer periodic thruput seconds argument |
|
- test: add pure docker setting test ( |
|
- test: add unittest for dataset and evaluator ( |
|
- test: add unittest for on-policy algorithm ( |
|
- test: add unittest for ppo and td (MARL case) ( |
|
- test: polish collector benchmark test |
|
- fix: target model wrapper hard reset bug |
|
- fix: fix learn state_dict target model bug |
|
- fix: ppo bugs and update atari ppo offpolicy config ( |
|
- fix: pyyaml version bug ( |
|
- fix: small fix on bsuite environment ( |
|
- fix: discrete cql unittest bug |
|
- fix: release workflow bug |
|
- fix: base policy model state_dict overlap bug |
|
- fix: remove on_policy option in dizoo config and entry |
|
- fix: remove torch in env |
|
- style: gym version > 0.20.0 |
|
- style: torch version >= 1.1.0, <= 1.10.0 |
|
- style: ale-py == 0.7.0 |
|
|
|
2021.9.30(v0.2.0) |
|
- env: overcooked env ( |
|
- env: procgen env ( |
|
- env: modified predator env ( |
|
- env: d4rl env ( |
|
- env: imagenet dataset ( |
|
- env: bsuite env ( |
|
- env: move atari_py to ale-py |
|
- algo: SQIL algorithm ( |
|
- algo: CQL algorithm (discrete/continuous) ( |
|
- algo: MAPPO algorithm ( |
|
- algo: WQMIX algorithm ( |
|
- algo: D4PG algorithm ( |
|
- algo: update multi discrete policy(dqn, ppo, rainbow) ( |
|
- feature: image classification training pipeline ( |
|
- feature: add force_reproducibility option in subprocess env manager |
|
- feature: add/delete/restart replicas via cli for k8s |
|
- feautre: add league metric (trueskill and elo) ( |
|
- feature: add tb in naive buffer and modify tb in advanced buffer ( |
|
- feature: add k8s launcher and di-orchestrator launcher, add related unittest ( |
|
- feature: add hyper-parameter scheduler module ( |
|
- feautre: add plot function ( |
|
- fix: acer bug and update atari result ( |
|
- fix: mappo nan bug and dict obs cannot unsqueeze bug ( |
|
- fix: r2d2 hidden state and obs arange bug ( |
|
- fix: ppo bug when use dual_clip and adv > 0 |
|
- fix: qmix double_q hidden state bug |
|
- fix: spawn context problem in interaction unittest ( |
|
- fix: formatted config no eval bug ( |
|
- fix: the catch statments that will never succeed and system proxy bug ( |
|
- fix: lunarlander config |
|
- fix: c51 head dimension mismatch bug |
|
- fix: mujoco config typo bug |
|
- fix: ppg atari config bug |
|
- fix: max use and priority update special branch bug in advanced_buffer |
|
- style: add docker deploy in github workflow ( |
|
- style: support PyTorch 1.9.0 |
|
- style: add algo/env list in README |
|
- style: rename advanced_buffer register name to advanced |
|
|
|
|
|
2021.8.3(v0.1.1) |
|
- env: selfplay/league demo ( |
|
- env: pybullet env ( |
|
- env: minigrid env ( |
|
- env: atari enduro config ( |
|
- algo: on policy PPO ( |
|
- algo: ACER algorithm ( |
|
- feature: polish experiment directory structure ( |
|
- refactor: split doc to new repo ( |
|
- fix: atari env info action space bug |
|
- fix: env manager retry wrapper raise exception info bug |
|
- fix: dist entry disable-flask-log typo |
|
- style: codestyle optimization by lgtm ( |
|
- style: code/comment statistics badge |
|
- style: github CI workflow |
|
|
|
2021.7.8(v0.1.0) |
|
|