Spaces:

zjowowen
/

gomoku

Sleeping

App Files Files Community

gomoku / DI-engine /CHANGELOG

zjowowen

init space

079c32c 10 months ago

raw

history blame

22.2 kB

	2023.11.06(v0.5.0)
	- env: add tabmwp env (#667)
	- env: polish anytrading env issues (#731)
	- algo: add PromptPG algorithm (#667)
	- algo: add Plan Diffuser algorithm (#700)
	- algo: add new pipeline implementation of IMPALA algorithm (#713)
	- algo: add dropout layers to DQN-style algorithms (#712)
	- feature: add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737)
	- feature: add more unittest cases for model (#728)
	- feature: add collector logging in new pipeline (#735)
	- fix: logger middleware problems (#715)
	- fix: ppo parallel bug (#709)
	- fix: typo in optimizer_helper.py (#726)
	- fix: mlp dropout if condition bug
	- fix: drex collecting data unittest bugs
	- style: polish env manager/wrapper comments and API doc (#742)
	- style: polish model comments and API doc (#722) (#729) (#734) (#736) (#741)
	- style: polish policy comments and API doc (#732)
	- style: polish rl_utils comments and API doc (#724)
	- style: polish torch_utils comments and API doc (#738)
	- style: update README.md and Colab demo (#733)
	- style: update metaworld docker image

	2023.08.23(v0.4.9)
	- env: add cliffwalking env (#677)
	- env: add lunarlander ppo config and example
	- algo: add BCQ offline RL algorithm (#640)
	- algo: add Dreamerv3 model-based RL algorithm (#652)
	- algo: add tensor stream merge network tools (#673)
	- algo: add scatter connection model (#680)
	- algo: refactor Decision Transformer in new pipeline and support img input and discrete output (#693)
	- algo: add three variants of Bilinear classes and a FiLM class (#703)
	- feature: polish offpolicy RL multi-gpu DDP training (#679)
	- feature: add middleware for Ape-X distributed pipeline (#696)
	- feature: add example for evaluating trained DQN (#706)
	- fix: to_ndarray fails to assign dtype for scalars (#708)
	- fix: evaluator return episode_info compatibility bug
	- fix: cql example entry wrong config bug
	- fix: enable_save_figure env interface
	- fix: redundant env info bug in evaluator
	- fix: to_item unittest bug
	- style: polish and simplify requirements (#672)
	- style: add Hugging Face Model Zoo badge (#674)
	- style: add openxlab Model Zoo badge (#675)
	- style: fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678)
	- style: fix mujoco-py compatibility issue for cython<3 (#711)
	- style: fix type spell error (#704)
	- style: fix pypi release actions ubuntu 18.04 bug
	- style: update contact information (e.g. wechat)
	- style: polish algorithm doc tables

	2023.05.25(v0.4.8)
	- env: fix gym hybrid reward dtype bug (#664)
	- env: fix atari env id noframeskip bug (#655)
	- env: fix typo in gym any_trading env (#654)
	- env: update td3bc d4rl config (#659)
	- env: polish bipedalwalker config
	- algo: add EDAC offline RL algorithm (#639)
	- algo: add LN and GN norm_type support in ResBlock (#660)
	- algo: add normal value norm baseline for PPOF (#658)
	- algo: polish last layer init/norm in MLP (#650)
	- algo: polish TD3 monitor variable
	- feature: add MAPPO/MASAC task example (#661)
	- feature: add PPO example for complex env observation (#644)
	- feature: add barrier middleware (#570)
	- fix: abnormal collector log and add record_random_collect option (#662)
	- fix: to_item compatibility bug (#646)
	- fix: trainer dtype transform compatibility bug
	- fix: pettingzoo 1.23.0 compatibility bug
	- fix: ensemble head unittest bug
	- style: fix incompatible gym version bug in Dockerfile.env (#653)
	- style: add more algorithm docs

	2023.04.11(v0.4.7)
	- env: add dmc2gym env support and baseline (#451)
	- env: update pettingzoo to the latest version (#597)
	- env: polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
	- env: add lunarlander continuous TD3/SAC config
	- env: polish lunarlander discrete C51 config
	- algo: add Procedure Cloning (PC) imitation learning algorithm (#514)
	- algo: add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
	- algo: add reward/value norm methods: popart & value rescale & symlog (#605)
	- algo: polish reward model config and training pipeline (#624)
	- algo: add PPOF reward space demo support (#608)
	- algo: add PPOF Atari demo support (#589)
	- algo: polish dqn default config and env examples (#611)
	- algo: polish comment and clean code about SAC
	- feature: add language model (e.g. GPT) training utils (#625)
	- feature: remove policy cfg sub fields requirements (#620)
	- feature: add full wandb support (#579)
	- fix: confusing shallow copy operation about next_obs (#641)
	- fix: unsqueeze action_args in PDQN when shape is 1 (#599)
	- fix: evaluator return_info tensor type bug (#592)
	- fix: deque buffer wrapper PER bug (#586)
	- fix: reward model save method compatibility bug
	- fix: logger assertion and unittest bug
	- fix: bfs test py3.9 compatibility bug
	- fix: zergling collector unittest bug
	- style: add DI-engine torch-rpc p2p communication docker (#628)
	- style: add D4RL docker (#591)
	- style: correct typo in task (#617)
	- style: correct typo in time_helper (#602)
	- style: polish readme and add treetensor example
	- style: update contributing doc

	2023.02.16(v0.4.6)
	- env: add metadrive env and related ppo config (#574)
	- env: add acrobot env and related dqn config (#577)
	- env: add carracing in box2d (#575)
	- env: add new gym hybrid viz (#563)
	- env: update cartpole IL config (#578)
	- algo: add BDQ algorithm (#558)
	- algo: add procedure cloning model (#573)
	- feature: add simplified PPOF (PPO × Family) interface (#567) (#568) (#581) (#582)
	- fix: to_device and prev_state bug when using ttorch (#571)
	- fix: py38 and numpy unittest bugs (#565)
	- fix: typo in contrastive_loss.py (#572)
	- fix: dizoo envs pkg installation bugs
	- fix: multi_trainer middleware unittest bug
	- style: add evogym docker (#580)
	- style: fix metaworld docker bug
	- style: fix setuptools high version incompatibility bug
	- style: extend treetensor lowest version

	2022.12.13(v0.4.5)
	- env: add beergame supply chain optimization env (#512)
	- env: add env gym_pybullet_drones (#526)
	- env: rename eval reward to episode return (#536)
	- algo: add policy gradient algo implementation (#544)
	- algo: add MADDPG algo implementation (#550)
	- algo: add IMPALA continuous algo implementation (#551)
	- algo: add MADQN algo implementation (#540)
	- feature: add new task IMPALA-type distributed training scheme (#321)
	- feature: add load and save method for replaybuffer (#542)
	- feature: add more DingEnvWrapper example (#525)
	- feature: add evaluator more info viz support (#538)
	- feature: add trackback log for subprocess env manager (#534)
	- fix: halfcheetah td3 config file (#537)
	- fix: mujoco action_clip args compatibility bug (#535)
	- fix: atari a2c config entry bug
	- fix: drex unittest compatibility bug
	- style: add Roadmap issue of DI-engine (#548)
	- style: update related project link and new env doc

	2022.10.31(v0.4.4)
	- env: add modified gym-hybrid including moving, sliding and hardmove (#505) (#519)
	- env: add evogym support (#495) (#527)
	- env: add save_replay_gif option (#506)
	- env: adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500)
	- algo: add pcgrad optimizer (#489)
	- algo: add some features in MLP and ResBlock (#511)
	- algo: delete mcts related modules (#518)
	- feature: add wandb middleware and demo (#488) (#523) (#528)
	- feature: add new properties in Context (#499)
	- feature: add single env policy wrapper for policy deployment
	- feature: add custom model demo and doc
	- fix: build logger args and unittests (#522)
	- fix: total_loss calculation in PDQN (#504)
	- fix: save gif function bug
	- fix: level sample unittest bug
	- style: update contact email address (#503)
	- style: polish env log and resblock name
	- style: add details button in readme

	2022.09.23(v0.4.3)
	- env: add rule-based gomoku expert (#465)
	- algo: fix a2c policy batch size bug (#481)
	- algo: enable activation option in collaq attention and mixer
	- algo: minor fix about IBC (#477)
	- feature: add IGM support (#486)
	- feature: add tb logger middleware and demo
	- fix: the type conversion in ding_env_wrapper (#483)
	- fix: di-orchestrator version bug in unittest (#479)
	- fix: data collection errors caused by shallow copies (#475)
	- fix: gym==0.26.0 seed args bug
	- style: add readme tutorial link(environment & algorithm) (#490) (#493)
	- style: adjust location of the default_model method in policy (#453)

	2022.09.08(v0.4.2)
	- env: add rocket env (#449)
	- env: updated pettingzoo env and improved related performance (#457)
	- env: add mario env demo (#443)
	- env: add MAPPO multi-agent config (#464)
	- env: add mountain car (discrete action) environment (#452)
	- env: fix multi-agent mujoco gym comaptibility bug
	- env: fix gfootball env save_replay variable init bug
	- algo: add IBC (Implicit Behaviour Cloning) algorithm (#401)
	- algo: add BCO (Behaviour Cloning from Observation) algorithm (#270)
	- algo: add continuous PPOPG algorithm (#414)
	- algo: add PER in CollaQ (#472)
	- algo: add activation option in QMIX and CollaQ
	- feature: update ctx to dataclass (#467)
	- fix: base_env FinalMeta bug about gym 0.25.0-0.25.1
	- fix: config inplace modification bug
	- fix: ding cli no argument problem
	- fix: import errors after running setup.py (jinja2, markupsafe)
	- fix: conda py3.6 and cross platform build bug
	- style: add project state and datetime in log dir (#455)
	- style: polish notes for q-learning model (#427)
	- style: revision to mujoco dockerfile and validation (#474)
	- style: add dockerfile for cityflow env
	- style: polish default output log format

	2022.08.12(v0.4.1)
	- env: add gym trading env (#424)
	- env: add board games env (tictactoe, gomuku, chess) (#356)
	- env: add sokoban env (#397) (#429)
	- env: add BC and DQN demo for gfootball (#418) (#423)
	- env: add discrete pendulum env (#395)
	- algo: add STEVE model-based algorithm (#363)
	- algo: add PLR algorithm (#408)
	- algo: plugin ST-DIM in PPO (#379)
	- feature: add final result saving in training pipeline
	- fix: random policy randomness bug
	- fix: action_space seed compalbility bug
	- fix: discard message sent by self in redis mq (#354)
	- fix: remove pace controller (#400)
	- fix: import error in serial_pipeline_trex (#410)
	- fix: unittest hang and fail bug (#413)
	- fix: DREX collect data unittest bug
	- fix: remove unused import cv2
	- fix: ding CLI env/policy option bug
	- style: upgrade Python version from 3.6-3.8 to 3.7-3.9
	- style: upgrade gym version from 0.20.0 to 0.25.0
	- style: upgrade torch version from 1.10.0 to 1.12.0
	- style: upgrade mujoco bin from 2.0.0 to 2.1.0
	- style: add buffer api description (#371)
	- style: polish VAE comments (#404)
	- style: unittest for FQF (#412)
	- style: add metaworld dockerfile (#432)
	- style: remove opencv requirement in default setting
	- style: update long description in setup.py

	2022.06.21(v0.4.0)
	- env: add MAPPO/MASAC all configs in SMAC (#310) (SOTA results in SMAC!!!)
	- env: add dmc2gym env (#344) (#360)
	- env: remove DI-star requirements of dizoo/smac, use official pysc2 (#302)
	- env: add latest GAIL mujoco config (#298)
	- env: polish procgen env (#311)
	- env: add MBPO ant and humanoid config for mbpo (#314)
	- env: fix slime volley env obs space bug when agent_vs_agent
	- env: fix smac env obs space bug
	- env: fix import path error in lunarlander (#362)
	- algo: add Decision Transformer algorithm (#327) (#364)
	- algo: add on-policy PPG algorithm (#312)
	- algo: add DDPPO & add model-based SAC with lambda-return algorithm (#332)
	- algo: add infoNCE loss and ST-DIM algorithm (#326)
	- algo: add FQF distributional RL algorithm (#274)
	- algo: add continuous BC algorithm (#318）
	- algo: add pure policy gradient PPO algorithm (#382)
	- algo: add SQIL + SAC algorithm (#348)
	- algo: polish NGU and related modules (#283) (#343) (#353)
	- algo: add marl distributional td loss (#331)
	- feature: add new worker middleware (#236)
	- feature: refactor model-based RL pipeline (ding/world_model) (#332)
	- feature: refactor logging system in the whole DI-engine (#316)
	- feature: add env supervisor design (#330)
	- feature: support async reset for envpool env manager (#250)
	- feature: add log videos to tensorboard (#320)
	- feature: refactor impala cnn encoder interface (#378)
	- fix: env save replay bug
	- fix: transformer mask inplace operation bug
	- fix: transtion_with_policy_data bug in SAC and PPG
	- style: add dockerfile for ding:hpc image (#337)
	- style: fix mpire 2.3.5 which handles default processes more elegantly (#306)
	- style: use FORMAT_DIR instead of ./ding (#309）
	- style: update quickstart colab link (#347)
	- style: polish comments in ding/model/common (#315)
	- style: update mujoco docker download path (#386)
	- style: fix protobuf new version compatibility bug
	- style: fix torch1.8.0 torch.div compatibility bug
	- style: update doc links in readme
	- style: add outline in readme and update wechat image
	- style: update head image and refactor docker dir

	2022.04.23(v0.3.1)
	- env: polish and standardize dizoo config (#252) (#255) (#249) (#246) (#262) (#261) (#266) (#273) (#263) (#280) (#259) (#286) (#277) (#290) (#289) (#299)
	- env: add GRF academic env and config (#281)
	- env: update env inferface of GRF (#258)
	- env: update D4RL offline RL env and config (#285)
	- env: polish PomdpAtariEnv (#254)
	- algo: DREX algorithm (#218)
	- feature: separate mq and parallel modules, add redis (#247)
	- feature: rename env variables; fix attach_to parameter (#244)
	- feature: env implementation check (#275)
	- feature: adjust and set the max column number of tabulate in log (#296)
	- feature: add drop_extra option for sample collect
	- feature: speed up GTrXL forward method + GRU unittest (#253) (#292)
	- fix: add act_scale in DingEnvWrapper; fix envpool env manager (#245)
	- fix: auto_reset=False and env_ref bug in env manager (#248)
	- fix: data type and deepcopy bug in RND (#288)
	- fix: share_memory bug and multi_mujoco env (#279)
	- fix: some bugs in GTrXL (#276)
	- fix: update gym_vector_env_manager and add more unittest (#241)
	- fix: mdpolicy random collect bug (#293)
	- fix: gym.wrapper save video replay bug
	- fix: collect abnormal step format bug and add unittest
	- test: add buffer benchmark & socket test (#284)
	- style: upgrade mpire (#251)
	- style: add GRF(google research football) docker (#256)
	- style: update policy and gail comment

	2022.03.24(v0.3.0)
	- env: add bitfilp HER DQN benchmark (#192) (#193) (#197)
	- env: slime volley league training demo (#229)
	- algo: Gated TransformXL (GTrXL) algorithm (#136)
	- algo: TD3 + VAE(HyAR) latent action algorithm (#152)
	- algo: stochastic dueling network (#234)
	- algo: use log prob instead of using prob in ACER (#186)
	- feature: support envpool env manager (#228)
	- feature: add league main and other improvements in new framework (#177) (#214)
	- feature: add pace controller middleware in new framework (#198)
	- feature: add auto recover option in new framework (#242)
	- feature: add k8s parser in new framework (#243)
	- feature: support async event handler and logger (#213)
	- feautre: add grad norm calculator (#205)
	- feautre: add gym vector env manager (#147)
	- feautre: add train_iter and env_step in serial pipeline (#212)
	- feautre: add rich logger handler (#219) (#223) (#232)
	- feature: add naive lr_scheduler demo
	- refactor: new BaseEnv and DingEnvWrapper (#171) (#231) (#240)
	- polish: MAPPO and MASAC smac config (#209) (#239)
	- polish: QMIX smac config (#175)
	- polish: R2D2 atari config (#181)
	- polish: A2C atari config (#189)
	- polish: GAIL box2d and mujoco config (#188)
	- polish: ACER atari config (#180)
	- polish: SQIL atari config (#230)
	- polish: TREX atari/mujoco config
	- polish: IMPALA atari config
	- polish: MBPO/D4PG mujoco config
	- fix: random_collect compatible to episode collector (#190)
	- fix: remove default n_sample/n_episode value in policy config (#185)
	- fix: PDQN model bug on gpu device (#220)
	- fix: TREX algorithm CLI bug (#182)
	- fix: DQfD JE computation bug and move to AdamW optimizer (#191)
	- fix: pytest problem for parallel middleware (#211)
	- fix: mujoco numpy compatibility bug
	- fix: markupsafe 2.1.0 bug
	- fix: framework parallel module network emit bug
	- fix: mpire bug and disable algotest in py3.8
	- fix: lunarlander env import and env_id bug
	- fix: icm unittest repeat name bug
	- fix: buffer thruput close bug
	- test: resnet unittest (#199)
	- test: SAC/SQN unittest (#207)
	- test: CQL/R2D3/GAIL unittest (#201)
	- test: NGU td unittest (#210)
	- test: model wrapper unittest (#215)
	- test: MAQAC model unittest (#226)
	- style: add doc docker (#221)

	2022.01.01(v0.2.3)
	- env: add multi-agent mujoco env (#146)
	- env: add delay reward mujoco env (#145)
	- env: fix port conflict in gym_soccer (#139)
	- algo: MASAC algorithm (#112)
	- algo: TREX algorithm (#119) (#144)
	- algo: H-PPO hybrid action space algorithm (#140)
	- algo: residual link in R2D2 (#150)
	- algo: gumbel softmax (#169)
	- algo: move actor_head_type to action_space field
	- feature: new main pipeline and async/parallel framework (#142) (#166) (#168)
	- feature: refactor buffer, separate algorithm and storage (#129)
	- feature: cli in new pipeline(ditask) (#160)
	- feature: add multiprocess tblogger, fix circular reference problem (#156)
	- feature: add multiple seed cli
	- feature: polish eps_greedy_multinomial_sample in model_wrapper (#154)
	- fix: R2D3 abs priority problem (#158) (#161)
	- fix: multi-discrete action space policies random action bug (#167)
	- fix: doc generate bug with enum_tools (#155)
	- style: more comments about R2D2 (#149)
	- style: add doc about how to migrate a new env
	- style: add doc about env tutorial in dizoo
	- style: add conda auto release (#148)
	- style: udpate zh doc link
	- style: update kaggle tutorial link

	2021.12.03(v0.2.2)
	- env: apple key to door treasure env (#128)
	- env: add bsuite memory benchmark (#138)
	- env: polish atari impala config
	- algo: Guided Cost IRL algorithm (#57)
	- algo: ICM exploration algorithm (#41)
	- algo: MP-DQN hybrid action space algorithm (#131)
	- algo: add loss statistics and polish r2d3 pong config (#126)
	- feautre: add renew env mechanism in env manager and update timeout mechanism (#127) (#134)
	- fix: async subprocess env manager reset bug (#137)
	- fix: keepdims name bug in model wrapper
	- fix: on-policy ppo value norm bug
	- fix: GAE and RND unittest bug
	- fix: hidden state wrapper h tensor compatiblity
	- fix: naive buffer auto config create bug
	- style: add supporters list

	2021.11.22(v0.2.1)
	- env: gym-hybrid env (#86)
	- env: gym-soccer (HFO) env (#94)
	- env: Go-Bigger env baseline (#95)
	- env: add the bipedalwalker config of sac and ppo (#121)
	- algo: DQfD Imitation Learning algorithm (#48) (#98)
	- algo: TD3BC offline RL algorithm (#88)
	- algo: MBPO model-based RL algorithm (#113)
	- algo: PADDPG hybrid action space algorithm (#109)
	- algo: PDQN hybrid action space algorithm (#118)
	- algo: fix R2D2 bugs and produce benchmark, add naive NGU (#40)
	- algo: self-play training demo in slime_volley env (#23)
	- algo: add example of GAIL entry + config for mujoco (#114)
	- feature: enable arbitrary policy num in serial sample collector
	- feautre: add torch DataParallel for single machine multi-GPU
	- feature: add registry force_overwrite argument
	- feature: add naive buffer periodic thruput seconds argument
	- test: add pure docker setting test (#103)
	- test: add unittest for dataset and evaluator (#107)
	- test: add unittest for on-policy algorithm (#92)
	- test: add unittest for ppo and td (MARL case) (#89)
	- test: polish collector benchmark test
	- fix: target model wrapper hard reset bug
	- fix: fix learn state_dict target model bug
	- fix: ppo bugs and update atari ppo offpolicy config (#108)
	- fix: pyyaml version bug (#99)
	- fix: small fix on bsuite environment (#117)
	- fix: discrete cql unittest bug
	- fix: release workflow bug
	- fix: base policy model state_dict overlap bug
	- fix: remove on_policy option in dizoo config and entry
	- fix: remove torch in env
	- style: gym version > 0.20.0
	- style: torch version >= 1.1.0, <= 1.10.0
	- style: ale-py == 0.7.0

	2021.9.30(v0.2.0)
	- env: overcooked env (#20)
	- env: procgen env (#26)
	- env: modified predator env (#30)
	- env: d4rl env (#37)
	- env: imagenet dataset (#27)
	- env: bsuite env (#58)
	- env: move atari_py to ale-py
	- algo: SQIL algorithm (#25) (#44)
	- algo: CQL algorithm (discrete/continuous) (#37) (#68)
	- algo: MAPPO algorithm (#62)
	- algo: WQMIX algorithm (#24)
	- algo: D4PG algorithm (#76)
	- algo: update multi discrete policy(dqn, ppo, rainbow) (#51) (#72)
	- feature: image classification training pipeline (#27)
	- feature: add force_reproducibility option in subprocess env manager
	- feature: add/delete/restart replicas via cli for k8s
	- feautre: add league metric (trueskill and elo) (#22)
	- feature: add tb in naive buffer and modify tb in advanced buffer (#39)
	- feature: add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49)
	- feature: add hyper-parameter scheduler module (#38)
	- feautre: add plot function (#59)
	- fix: acer bug and update atari result (#21)
	- fix: mappo nan bug and dict obs cannot unsqueeze bug (#54)
	- fix: r2d2 hidden state and obs arange bug (#36) (#52)
	- fix: ppo bug when use dual_clip and adv > 0
	- fix: qmix double_q hidden state bug
	- fix: spawn context problem in interaction unittest (#69)
	- fix: formatted config no eval bug (#53)
	- fix: the catch statments that will never succeed and system proxy bug (#71) (#79)
	- fix: lunarlander config
	- fix: c51 head dimension mismatch bug
	- fix: mujoco config typo bug
	- fix: ppg atari config bug
	- fix: max use and priority update special branch bug in advanced_buffer
	- style: add docker deploy in github workflow (#70) (#78) (#80)
	- style: support PyTorch 1.9.0
	- style: add algo/env list in README
	- style: rename advanced_buffer register name to advanced


	2021.8.3(v0.1.1)
	- env: selfplay/league demo (#12)
	- env: pybullet env (#16)
	- env: minigrid env (#13)
	- env: atari enduro config (#11)
	- algo: on policy PPO (#9)
	- algo: ACER algorithm (#14)
	- feature: polish experiment directory structure (#10)
	- refactor: split doc to new repo (#4)
	- fix: atari env info action space bug
	- fix: env manager retry wrapper raise exception info bug
	- fix: dist entry disable-flask-log typo
	- style: codestyle optimization by lgtm (#7)
	- style: code/comment statistics badge
	- style: github CI workflow

	2021.7.8(v0.1.0)