2023.11.06(v0.5.0) - env: add tabmwp env (#667) - env: polish anytrading env issues (#731) - algo: add PromptPG algorithm (#667) - algo: add Plan Diffuser algorithm (#700) - algo: add new pipeline implementation of IMPALA algorithm (#713) - algo: add dropout layers to DQN-style algorithms (#712) - feature: add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737) - feature: add more unittest cases for model (#728) - feature: add collector logging in new pipeline (#735) - fix: logger middleware problems (#715) - fix: ppo parallel bug (#709) - fix: typo in optimizer_helper.py (#726) - fix: mlp dropout if condition bug - fix: drex collecting data unittest bugs - style: polish env manager/wrapper comments and API doc (#742) - style: polish model comments and API doc (#722) (#729) (#734) (#736) (#741) - style: polish policy comments and API doc (#732) - style: polish rl_utils comments and API doc (#724) - style: polish torch_utils comments and API doc (#738) - style: update README.md and Colab demo (#733) - style: update metaworld docker image 2023.08.23(v0.4.9) - env: add cliffwalking env (#677) - env: add lunarlander ppo config and example - algo: add BCQ offline RL algorithm (#640) - algo: add Dreamerv3 model-based RL algorithm (#652) - algo: add tensor stream merge network tools (#673) - algo: add scatter connection model (#680) - algo: refactor Decision Transformer in new pipeline and support img input and discrete output (#693) - algo: add three variants of Bilinear classes and a FiLM class (#703) - feature: polish offpolicy RL multi-gpu DDP training (#679) - feature: add middleware for Ape-X distributed pipeline (#696) - feature: add example for evaluating trained DQN (#706) - fix: to_ndarray fails to assign dtype for scalars (#708) - fix: evaluator return episode_info compatibility bug - fix: cql example entry wrong config bug - fix: enable_save_figure env interface - fix: redundant env info bug in evaluator - fix: to_item unittest bug - style: polish and simplify requirements (#672) - style: add Hugging Face Model Zoo badge (#674) - style: add openxlab Model Zoo badge (#675) - style: fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678) - style: fix mujoco-py compatibility issue for cython<3 (#711) - style: fix type spell error (#704) - style: fix pypi release actions ubuntu 18.04 bug - style: update contact information (e.g. wechat) - style: polish algorithm doc tables 2023.05.25(v0.4.8) - env: fix gym hybrid reward dtype bug (#664) - env: fix atari env id noframeskip bug (#655) - env: fix typo in gym any_trading env (#654) - env: update td3bc d4rl config (#659) - env: polish bipedalwalker config - algo: add EDAC offline RL algorithm (#639) - algo: add LN and GN norm_type support in ResBlock (#660) - algo: add normal value norm baseline for PPOF (#658) - algo: polish last layer init/norm in MLP (#650) - algo: polish TD3 monitor variable - feature: add MAPPO/MASAC task example (#661) - feature: add PPO example for complex env observation (#644) - feature: add barrier middleware (#570) - fix: abnormal collector log and add record_random_collect option (#662) - fix: to_item compatibility bug (#646) - fix: trainer dtype transform compatibility bug - fix: pettingzoo 1.23.0 compatibility bug - fix: ensemble head unittest bug - style: fix incompatible gym version bug in Dockerfile.env (#653) - style: add more algorithm docs 2023.04.11(v0.4.7) - env: add dmc2gym env support and baseline (#451) - env: update pettingzoo to the latest version (#597) - env: polish icm/rnd+onppo config bugs and add app_door_to_key env (#564) - env: add lunarlander continuous TD3/SAC config - env: polish lunarlander discrete C51 config - algo: add Procedure Cloning (PC) imitation learning algorithm (#514) - algo: add Munchausen Reinforcement Learning (MDQN) algorithm (#590) - algo: add reward/value norm methods: popart & value rescale & symlog (#605) - algo: polish reward model config and training pipeline (#624) - algo: add PPOF reward space demo support (#608) - algo: add PPOF Atari demo support (#589) - algo: polish dqn default config and env examples (#611) - algo: polish comment and clean code about SAC - feature: add language model (e.g. GPT) training utils (#625) - feature: remove policy cfg sub fields requirements (#620) - feature: add full wandb support (#579) - fix: confusing shallow copy operation about next_obs (#641) - fix: unsqueeze action_args in PDQN when shape is 1 (#599) - fix: evaluator return_info tensor type bug (#592) - fix: deque buffer wrapper PER bug (#586) - fix: reward model save method compatibility bug - fix: logger assertion and unittest bug - fix: bfs test py3.9 compatibility bug - fix: zergling collector unittest bug - style: add DI-engine torch-rpc p2p communication docker (#628) - style: add D4RL docker (#591) - style: correct typo in task (#617) - style: correct typo in time_helper (#602) - style: polish readme and add treetensor example - style: update contributing doc 2023.02.16(v0.4.6) - env: add metadrive env and related ppo config (#574) - env: add acrobot env and related dqn config (#577) - env: add carracing in box2d (#575) - env: add new gym hybrid viz (#563) - env: update cartpole IL config (#578) - algo: add BDQ algorithm (#558) - algo: add procedure cloning model (#573) - feature: add simplified PPOF (PPO × Family) interface (#567) (#568) (#581) (#582) - fix: to_device and prev_state bug when using ttorch (#571) - fix: py38 and numpy unittest bugs (#565) - fix: typo in contrastive_loss.py (#572) - fix: dizoo envs pkg installation bugs - fix: multi_trainer middleware unittest bug - style: add evogym docker (#580) - style: fix metaworld docker bug - style: fix setuptools high version incompatibility bug - style: extend treetensor lowest version 2022.12.13(v0.4.5) - env: add beergame supply chain optimization env (#512) - env: add env gym_pybullet_drones (#526) - env: rename eval reward to episode return (#536) - algo: add policy gradient algo implementation (#544) - algo: add MADDPG algo implementation (#550) - algo: add IMPALA continuous algo implementation (#551) - algo: add MADQN algo implementation (#540) - feature: add new task IMPALA-type distributed training scheme (#321) - feature: add load and save method for replaybuffer (#542) - feature: add more DingEnvWrapper example (#525) - feature: add evaluator more info viz support (#538) - feature: add trackback log for subprocess env manager (#534) - fix: halfcheetah td3 config file (#537) - fix: mujoco action_clip args compatibility bug (#535) - fix: atari a2c config entry bug - fix: drex unittest compatibility bug - style: add Roadmap issue of DI-engine (#548) - style: update related project link and new env doc 2022.10.31(v0.4.4) - env: add modified gym-hybrid including moving, sliding and hardmove (#505) (#519) - env: add evogym support (#495) (#527) - env: add save_replay_gif option (#506) - env: adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500) - algo: add pcgrad optimizer (#489) - algo: add some features in MLP and ResBlock (#511) - algo: delete mcts related modules (#518) - feature: add wandb middleware and demo (#488) (#523) (#528) - feature: add new properties in Context (#499) - feature: add single env policy wrapper for policy deployment - feature: add custom model demo and doc - fix: build logger args and unittests (#522) - fix: total_loss calculation in PDQN (#504) - fix: save gif function bug - fix: level sample unittest bug - style: update contact email address (#503) - style: polish env log and resblock name - style: add details button in readme 2022.09.23(v0.4.3) - env: add rule-based gomoku expert (#465) - algo: fix a2c policy batch size bug (#481) - algo: enable activation option in collaq attention and mixer - algo: minor fix about IBC (#477) - feature: add IGM support (#486) - feature: add tb logger middleware and demo - fix: the type conversion in ding_env_wrapper (#483) - fix: di-orchestrator version bug in unittest (#479) - fix: data collection errors caused by shallow copies (#475) - fix: gym==0.26.0 seed args bug - style: add readme tutorial link(environment & algorithm) (#490) (#493) - style: adjust location of the default_model method in policy (#453) 2022.09.08(v0.4.2) - env: add rocket env (#449) - env: updated pettingzoo env and improved related performance (#457) - env: add mario env demo (#443) - env: add MAPPO multi-agent config (#464) - env: add mountain car (discrete action) environment (#452) - env: fix multi-agent mujoco gym comaptibility bug - env: fix gfootball env save_replay variable init bug - algo: add IBC (Implicit Behaviour Cloning) algorithm (#401) - algo: add BCO (Behaviour Cloning from Observation) algorithm (#270) - algo: add continuous PPOPG algorithm (#414) - algo: add PER in CollaQ (#472) - algo: add activation option in QMIX and CollaQ - feature: update ctx to dataclass (#467) - fix: base_env FinalMeta bug about gym 0.25.0-0.25.1 - fix: config inplace modification bug - fix: ding cli no argument problem - fix: import errors after running setup.py (jinja2, markupsafe) - fix: conda py3.6 and cross platform build bug - style: add project state and datetime in log dir (#455) - style: polish notes for q-learning model (#427) - style: revision to mujoco dockerfile and validation (#474) - style: add dockerfile for cityflow env - style: polish default output log format 2022.08.12(v0.4.1) - env: add gym trading env (#424) - env: add board games env (tictactoe, gomuku, chess) (#356) - env: add sokoban env (#397) (#429) - env: add BC and DQN demo for gfootball (#418) (#423) - env: add discrete pendulum env (#395) - algo: add STEVE model-based algorithm (#363) - algo: add PLR algorithm (#408) - algo: plugin ST-DIM in PPO (#379) - feature: add final result saving in training pipeline - fix: random policy randomness bug - fix: action_space seed compalbility bug - fix: discard message sent by self in redis mq (#354) - fix: remove pace controller (#400) - fix: import error in serial_pipeline_trex (#410) - fix: unittest hang and fail bug (#413) - fix: DREX collect data unittest bug - fix: remove unused import cv2 - fix: ding CLI env/policy option bug - style: upgrade Python version from 3.6-3.8 to 3.7-3.9 - style: upgrade gym version from 0.20.0 to 0.25.0 - style: upgrade torch version from 1.10.0 to 1.12.0 - style: upgrade mujoco bin from 2.0.0 to 2.1.0 - style: add buffer api description (#371) - style: polish VAE comments (#404) - style: unittest for FQF (#412) - style: add metaworld dockerfile (#432) - style: remove opencv requirement in default setting - style: update long description in setup.py 2022.06.21(v0.4.0) - env: add MAPPO/MASAC all configs in SMAC (#310) **(SOTA results in SMAC!!!)** - env: add dmc2gym env (#344) (#360) - env: remove DI-star requirements of dizoo/smac, use official pysc2 (#302) - env: add latest GAIL mujoco config (#298) - env: polish procgen env (#311) - env: add MBPO ant and humanoid config for mbpo (#314) - env: fix slime volley env obs space bug when agent_vs_agent - env: fix smac env obs space bug - env: fix import path error in lunarlander (#362) - algo: add Decision Transformer algorithm (#327) (#364) - algo: add on-policy PPG algorithm (#312) - algo: add DDPPO & add model-based SAC with lambda-return algorithm (#332) - algo: add infoNCE loss and ST-DIM algorithm (#326) - algo: add FQF distributional RL algorithm (#274) - algo: add continuous BC algorithm (#318) - algo: add pure policy gradient PPO algorithm (#382) - algo: add SQIL + SAC algorithm (#348) - algo: polish NGU and related modules (#283) (#343) (#353) - algo: add marl distributional td loss (#331) - feature: add new worker middleware (#236) - feature: refactor model-based RL pipeline (ding/world_model) (#332) - feature: refactor logging system in the whole DI-engine (#316) - feature: add env supervisor design (#330) - feature: support async reset for envpool env manager (#250) - feature: add log videos to tensorboard (#320) - feature: refactor impala cnn encoder interface (#378) - fix: env save replay bug - fix: transformer mask inplace operation bug - fix: transtion_with_policy_data bug in SAC and PPG - style: add dockerfile for ding:hpc image (#337) - style: fix mpire 2.3.5 which handles default processes more elegantly (#306) - style: use FORMAT_DIR instead of ./ding (#309) - style: update quickstart colab link (#347) - style: polish comments in ding/model/common (#315) - style: update mujoco docker download path (#386) - style: fix protobuf new version compatibility bug - style: fix torch1.8.0 torch.div compatibility bug - style: update doc links in readme - style: add outline in readme and update wechat image - style: update head image and refactor docker dir 2022.04.23(v0.3.1) - env: polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299) - env: add GRF academic env and config (#281) - env: update env inferface of GRF (#258) - env: update D4RL offline RL env and config (#285) - env: polish PomdpAtariEnv (#254) - algo: DREX algorithm (#218) - feature: separate mq and parallel modules, add redis (#247) - feature: rename env variables; fix attach_to parameter (#244) - feature: env implementation check (#275) - feature: adjust and set the max column number of tabulate in log (#296) - feature: add drop_extra option for sample collect - feature: speed up GTrXL forward method + GRU unittest (#253) (#292) - fix: add act_scale in DingEnvWrapper; fix envpool env manager (#245) - fix: auto_reset=False and env_ref bug in env manager (#248) - fix: data type and deepcopy bug in RND (#288) - fix: share_memory bug and multi_mujoco env (#279) - fix: some bugs in GTrXL (#276) - fix: update gym_vector_env_manager and add more unittest (#241) - fix: mdpolicy random collect bug (#293) - fix: gym.wrapper save video replay bug - fix: collect abnormal step format bug and add unittest - test: add buffer benchmark & socket test (#284) - style: upgrade mpire (#251) - style: add GRF(google research football) docker (#256) - style: update policy and gail comment 2022.03.24(v0.3.0) - env: add bitfilp HER DQN benchmark (#192) (#193) (#197) - env: slime volley league training demo (#229) - algo: Gated TransformXL (GTrXL) algorithm (#136) - algo: TD3 + VAE(HyAR) latent action algorithm (#152) - algo: stochastic dueling network (#234) - algo: use log prob instead of using prob in ACER (#186) - feature: support envpool env manager (#228) - feature: add league main and other improvements in new framework (#177) (#214) - feature: add pace controller middleware in new framework (#198) - feature: add auto recover option in new framework (#242) - feature: add k8s parser in new framework (#243) - feature: support async event handler and logger (#213) - feautre: add grad norm calculator (#205) - feautre: add gym vector env manager (#147) - feautre: add train_iter and env_step in serial pipeline (#212) - feautre: add rich logger handler (#219) (#223) (#232) - feature: add naive lr_scheduler demo - refactor: new BaseEnv and DingEnvWrapper (#171) (#231) (#240) - polish: MAPPO and MASAC smac config (#209) (#239) - polish: QMIX smac config (#175) - polish: R2D2 atari config (#181) - polish: A2C atari config (#189) - polish: GAIL box2d and mujoco config (#188) - polish: ACER atari config (#180) - polish: SQIL atari config (#230) - polish: TREX atari/mujoco config - polish: IMPALA atari config - polish: MBPO/D4PG mujoco config - fix: random_collect compatible to episode collector (#190) - fix: remove default n_sample/n_episode value in policy config (#185) - fix: PDQN model bug on gpu device (#220) - fix: TREX algorithm CLI bug (#182) - fix: DQfD JE computation bug and move to AdamW optimizer (#191) - fix: pytest problem for parallel middleware (#211) - fix: mujoco numpy compatibility bug - fix: markupsafe 2.1.0 bug - fix: framework parallel module network emit bug - fix: mpire bug and disable algotest in py3.8 - fix: lunarlander env import and env_id bug - fix: icm unittest repeat name bug - fix: buffer thruput close bug - test: resnet unittest (#199) - test: SAC/SQN unittest (#207) - test: CQL/R2D3/GAIL unittest (#201) - test: NGU td unittest (#210) - test: model wrapper unittest (#215) - test: MAQAC model unittest (#226) - style: add doc docker (#221) 2022.01.01(v0.2.3) - env: add multi-agent mujoco env (#146) - env: add delay reward mujoco env (#145) - env: fix port conflict in gym_soccer (#139) - algo: MASAC algorithm (#112) - algo: TREX algorithm (#119) (#144) - algo: H-PPO hybrid action space algorithm (#140) - algo: residual link in R2D2 (#150) - algo: gumbel softmax (#169) - algo: move actor_head_type to action_space field - feature: new main pipeline and async/parallel framework (#142) (#166) (#168) - feature: refactor buffer, separate algorithm and storage (#129) - feature: cli in new pipeline(ditask) (#160) - feature: add multiprocess tblogger, fix circular reference problem (#156) - feature: add multiple seed cli - feature: polish eps_greedy_multinomial_sample in model_wrapper (#154) - fix: R2D3 abs priority problem (#158) (#161) - fix: multi-discrete action space policies random action bug (#167) - fix: doc generate bug with enum_tools (#155) - style: more comments about R2D2 (#149) - style: add doc about how to migrate a new env - style: add doc about env tutorial in dizoo - style: add conda auto release (#148) - style: udpate zh doc link - style: update kaggle tutorial link 2021.12.03(v0.2.2) - env: apple key to door treasure env (#128) - env: add bsuite memory benchmark (#138) - env: polish atari impala config - algo: Guided Cost IRL algorithm (#57) - algo: ICM exploration algorithm (#41) - algo: MP-DQN hybrid action space algorithm (#131) - algo: add loss statistics and polish r2d3 pong config (#126) - feautre: add renew env mechanism in env manager and update timeout mechanism (#127) (#134) - fix: async subprocess env manager reset bug (#137) - fix: keepdims name bug in model wrapper - fix: on-policy ppo value norm bug - fix: GAE and RND unittest bug - fix: hidden state wrapper h tensor compatiblity - fix: naive buffer auto config create bug - style: add supporters list 2021.11.22(v0.2.1) - env: gym-hybrid env (#86) - env: gym-soccer (HFO) env (#94) - env: Go-Bigger env baseline (#95) - env: add the bipedalwalker config of sac and ppo (#121) - algo: DQfD Imitation Learning algorithm (#48) (#98) - algo: TD3BC offline RL algorithm (#88) - algo: MBPO model-based RL algorithm (#113) - algo: PADDPG hybrid action space algorithm (#109) - algo: PDQN hybrid action space algorithm (#118) - algo: fix R2D2 bugs and produce benchmark, add naive NGU (#40) - algo: self-play training demo in slime_volley env (#23) - algo: add example of GAIL entry + config for mujoco (#114) - feature: enable arbitrary policy num in serial sample collector - feautre: add torch DataParallel for single machine multi-GPU - feature: add registry force_overwrite argument - feature: add naive buffer periodic thruput seconds argument - test: add pure docker setting test (#103) - test: add unittest for dataset and evaluator (#107) - test: add unittest for on-policy algorithm (#92) - test: add unittest for ppo and td (MARL case) (#89) - test: polish collector benchmark test - fix: target model wrapper hard reset bug - fix: fix learn state_dict target model bug - fix: ppo bugs and update atari ppo offpolicy config (#108) - fix: pyyaml version bug (#99) - fix: small fix on bsuite environment (#117) - fix: discrete cql unittest bug - fix: release workflow bug - fix: base policy model state_dict overlap bug - fix: remove on_policy option in dizoo config and entry - fix: remove torch in env - style: gym version > 0.20.0 - style: torch version >= 1.1.0, <= 1.10.0 - style: ale-py == 0.7.0 2021.9.30(v0.2.0) - env: overcooked env (#20) - env: procgen env (#26) - env: modified predator env (#30) - env: d4rl env (#37) - env: imagenet dataset (#27) - env: bsuite env (#58) - env: move atari_py to ale-py - algo: SQIL algorithm (#25) (#44) - algo: CQL algorithm (discrete/continuous) (#37) (#68) - algo: MAPPO algorithm (#62) - algo: WQMIX algorithm (#24) - algo: D4PG algorithm (#76) - algo: update multi discrete policy(dqn, ppo, rainbow) (#51) (#72) - feature: image classification training pipeline (#27) - feature: add force_reproducibility option in subprocess env manager - feature: add/delete/restart replicas via cli for k8s - feautre: add league metric (trueskill and elo) (#22) - feature: add tb in naive buffer and modify tb in advanced buffer (#39) - feature: add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49) - feature: add hyper-parameter scheduler module (#38) - feautre: add plot function (#59) - fix: acer bug and update atari result (#21) - fix: mappo nan bug and dict obs cannot unsqueeze bug (#54) - fix: r2d2 hidden state and obs arange bug (#36) (#52) - fix: ppo bug when use dual_clip and adv > 0 - fix: qmix double_q hidden state bug - fix: spawn context problem in interaction unittest (#69) - fix: formatted config no eval bug (#53) - fix: the catch statments that will never succeed and system proxy bug (#71) (#79) - fix: lunarlander config - fix: c51 head dimension mismatch bug - fix: mujoco config typo bug - fix: ppg atari config bug - fix: max use and priority update special branch bug in advanced_buffer - style: add docker deploy in github workflow (#70) (#78) (#80) - style: support PyTorch 1.9.0 - style: add algo/env list in README - style: rename advanced_buffer register name to advanced 2021.8.3(v0.1.1) - env: selfplay/league demo (#12) - env: pybullet env (#16) - env: minigrid env (#13) - env: atari enduro config (#11) - algo: on policy PPO (#9) - algo: ACER algorithm (#14) - feature: polish experiment directory structure (#10) - refactor: split doc to new repo (#4) - fix: atari env info action space bug - fix: env manager retry wrapper raise exception info bug - fix: dist entry disable-flask-log typo - style: codestyle optimization by lgtm (#7) - style: code/comment statistics badge - style: github CI workflow 2021.7.8(v0.1.0)